[jira] [Commented] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-19 Thread David Maughan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760748#comment-15760748
 ] 

David Maughan commented on HIVE-15434:
--

Hi,

Is anyone able to help me with this ticket? The patch is passing but it
seems other tests unrelated to the patch are failing.

- Dave




> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Assignee: David Maughan
> Attachments: HIVE-15434.01.patch, HIVE-15434.02.patch
>
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12158) Add methods to HCatClient for partition synchronization

2016-12-16 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-12158:
-
Attachment: (was: HIVE-12158.1.patch)

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.2.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-16 Thread David Maughan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754021#comment-15754021
 ] 

David Maughan commented on HIVE-15434:
--

These failures seem to be completely unrelated to this patch.

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Assignee: David Maughan
> Attachments: HIVE-15434.01.patch, HIVE-15434.02.patch
>
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-16 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15434:
-
Status: Patch Available  (was: In Progress)

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Assignee: David Maughan
> Attachments: HIVE-15434.01.patch, HIVE-15434.02.patch
>
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-16 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15434:
-
Attachment: HIVE-15434.02.patch

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Assignee: David Maughan
> Attachments: HIVE-15434.01.patch, HIVE-15434.02.patch
>
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-16 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15434:
-
Status: In Progress  (was: Patch Available)

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Assignee: David Maughan
> Attachments: HIVE-15434.01.patch
>
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12158) Add methods to HCatClient for partition synchronization

2016-12-16 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-12158:
-
Status: Patch Available  (was: In Progress)

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.1.patch, HIVE-12158.2.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12158) Add methods to HCatClient for partition synchronization

2016-12-16 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-12158:
-
Status: In Progress  (was: Patch Available)

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.1.patch, HIVE-12158.2.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-15 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15434:
-
Attachment: HIVE-15434.01.patch

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Assignee: David Maughan
> Attachments: HIVE-15434.01.patch
>
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-15 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15434:
-
Assignee: David Maughan
Release Note: Added UDF to allow interrogation of uniontype values.
  Status: Patch Available  (was: Open)

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Assignee: David Maughan
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-15 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15434:
-
Description: 
h2. Overview

As stated in the documention:

{quote}
UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
BY clauses will fail, and Hive does not define syntax to extract the tag or 
value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
look-at-only.
{quote}

It is essential to have a usable uniontype. Until full support is added to Hive 
users should at least have the ability to inspect and extract values for 
further comparison or transformation.

h2. Proposal

I propose to add a GenericUDF that has 2 modes of operation. Consider the 
following schema and data that contains a union:

Schema:

{code}
struct>
{code}

Query:

{code}
hive> select field1 from thing;
{0:0}
{1:"one"}
{code}

h4. Explode to Struct

This method will recursively convert all unions within the type to structs with 
fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} field 
that matches the tag of the union will be populated with the value. In the case 
above the schema of field1 will be converted to:

{code}
struct
{code}

{code}
hive> select extract_union(field1) from thing;
{"tag_0":0,"tag_1":null}
{"tag_0":null,"tag_1":one}
{code}

{code}
hive> select extract_union(field1).tag_0 from thing;
0
null
{code}

h4. Extract the specified tag

This method will simply extract the value of the specified tag. If the tag 
number matches then the value is returned, if it does not, then null is 
returned.

{code}
hive> select extract_union(field1, 0) from thing;
0
null
{code}

  was:
h2. Overview

As stated in the documention:

{quote}
UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
BY clauses will fail, and Hive does not define syntax to extract the tag or 
value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
look-at-only.
{quote}

It would be useful to have a UDF that allows extraction of values for further 
comparison or transformation.

h2. Proposal

I propose to add a GenericUDF that has 2 modes of operation. Consider the 
following schema and data that contains a union:

Schema:

{code}
struct>
{code}

Query:

{code}
hive> select field1 from thing;
{0:0}
{1:"one"}
{code}

h4. Explode to Struct

This method will recursively convert all unions within the type to structs with 
fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} field 
that matches the tag of the union will be populated with the value. In the case 
above the schema of field1 will be converted to:

{code}
struct
{code}

{code}
hive> select extract_union(field1) from thing;
{"tag_0":0,"tag_1":null}
{"tag_0":null,"tag_1":one}
{code}

{code}
hive> select extract_union(field1).tag_0 from thing;
0
null
{code}

h4. Extract the specified tag

This method will simply extract the value of the specified tag. If the tag 
number matches then the value is returned, if it does not, then null is 
returned.

{code}
hive> select extract_union(field1, 0) from thing;
0
null
{code}


> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It is essential to have a usable uniontype. Until full support is added to 
> Hive users should at least have the ability to inspect and extract values for 
> further comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will 

[jira] [Updated] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-15 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15434:
-
Priority: Major  (was: Minor)

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It would be useful to have a UDF that allows extraction of values for further 
> comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15434) Add UDF to allow interrogation of uniontype values

2016-12-15 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15434:
-
Summary: Add UDF to allow interrogation of uniontype values  (was: Hive 
GenericUDF to make uniontype more usable)

> Add UDF to allow interrogation of uniontype values
> --
>
> Key: HIVE-15434
> URL: https://issues.apache.org/jira/browse/HIVE-15434
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 2.1.1
>Reporter: David Maughan
>Priority: Minor
>
> h2. Overview
> As stated in the documention:
> {quote}
> UNIONTYPE support is incomplete The UNIONTYPE datatype was introduced in Hive 
> 0.7.0 (HIVE-537), but full support for this type in Hive remains incomplete. 
> Queries that reference UNIONTYPE fields in JOIN (HIVE-2508), WHERE, and GROUP 
> BY clauses will fail, and Hive does not define syntax to extract the tag or 
> value fields of a UNIONTYPE. This means that UNIONTYPEs are effectively 
> look-at-only.
> {quote}
> It would be useful to have a UDF that allows extraction of values for further 
> comparison or transformation.
> h2. Proposal
> I propose to add a GenericUDF that has 2 modes of operation. Consider the 
> following schema and data that contains a union:
> Schema:
> {code}
> struct>
> {code}
> Query:
> {code}
> hive> select field1 from thing;
> {0:0}
> {1:"one"}
> {code}
> h4. Explode to Struct
> This method will recursively convert all unions within the type to structs 
> with fields named {{tag_n}}, {{n}} being the tag number. Only the {{tag_*}} 
> field that matches the tag of the union will be populated with the value. In 
> the case above the schema of field1 will be converted to:
> {code}
> struct
> {code}
> {code}
> hive> select extract_union(field1) from thing;
> {"tag_0":0,"tag_1":null}
> {"tag_0":null,"tag_1":one}
> {code}
> {code}
> hive> select extract_union(field1).tag_0 from thing;
> 0
> null
> {code}
> h4. Extract the specified tag
> This method will simply extract the value of the specified tag. If the tag 
> number matches then the value is returned, if it does not, then null is 
> returned.
> {code}
> hive> select extract_union(field1, 0) from thing;
> 0
> null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15328) Inconsistent/incorrect handling of NULL in nested structs

2016-12-01 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15328:
-
Description: 
h2. Overview

Performing {{IS NULL}} checks against a null struct that is generated as part 
of a UDF correctly returns {{true}}. However, the same check against the same 
null struct that has been persisted to a table incorrectly returns {{false}}. 
Additionally, when a child field of the null struct is inspected in the same 
query, the result of the null check on the struct itself reverses itself to 
{{true}}.

The issue does not appear to be dependent on the storage format of the table as 
the same result is repeated with TEXTFILE, PARQUET, ORC and AVRO.

h2. Example

In this example I have used {{if(1=1, null, named_struct('c', 1))}} as a crude 
method of generating a simple null struct.

h4. 'b' is correctly reported as {{true}}.
{code}
hive> select
>   b is null,
>   b
> from (
>   select
> if(1=1, null, named_struct('c', 1)) as b
>   ) as a;
OK
trueNULL
{code}

h4. 'b' is correctly reported as {{true}} when also inspecting 'b.c'.
{code}
hive>
> select
>   b is null,
>   b.c is null,
>   b
> from (
>   select
> if(1=1, null, named_struct('c', 1)) as b
>   ) as a;
OK
truetrueNULL
{code}

h4. Persist the data to a table
{code}
hive>
> create table a
>   as
> select
>   if(1=1, null, named_struct('c', 1)) as b;
OK
{code}

h4. 'b' is incorrectly reported as {{false}}.
{code}
hive>
> select
>   b is null,
>   b
> from a;
OK
false   NULL
{code}

h4. 'b' is now correctly reported as {{true}} when also inspecting 'b.c'.
{code}
hive>
> select
>   b is null,
>   b.c is null,
>   b
> from a;
OK
truetrueNULL
{code}

  was:
h2. Overview

Performing {{IS NULL}} checks against a null struct that is generated as part 
of a UDF correctly returns {{true}}. However, the same check against the same 
null struct that has been persisted to a table incorrectly returns {{false}}. 
Additionally, when a child field of the null struct is inspected in the same 
query, the result of the null check on the struct itself reverses itself to 
{{true}}.

The issue does not appear to be dependent on the storage format of the table as 
the same result is repeated with TEXTFILE, PARQUET, ORC and AVRO.

h2. Example

In this example I have used {{if(1=1, null, named_struct('c', 1)}} as a crude 
method of generating a simple null struct.

h4. 'b' is correctly reported as {{true}}.
{code}
hive> select
>   b is null,
>   b
> from (
>   select
> if(1=1, null, named_struct('c', 1)) as b
>   ) as a;
OK
trueNULL
{code}

h4. 'b' is correctly reported as {{true}} when also inspecting 'b.c'.
{code}
hive>
> select
>   b is null,
>   b.c is null,
>   b
> from (
>   select
> if(1=1, null, named_struct('c', 1)) as b
>   ) as a;
OK
truetrueNULL
{code}

h4. Persist the data to a table
{code}
hive>
> create table a
>   as
> select
>   if(1=1, null, named_struct('c', 1)) as b;
OK
{code}

h4. 'b' is incorrectly reported as {{false}}.
{code}
hive>
> select
>   b is null,
>   b
> from a;
OK
false   NULL
{code}

h4. 'b' is now correctly reported as {{true}} when also inspecting 'b.c'.
{code}
hive>
> select
>   b is null,
>   b.c is null,
>   b
> from a;
OK
truetrueNULL
{code}


> Inconsistent/incorrect handling of NULL in nested structs
> -
>
> Key: HIVE-15328
> URL: https://issues.apache.org/jira/browse/HIVE-15328
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: David Maughan
>
> h2. Overview
> Performing {{IS NULL}} checks against a null struct that is generated as part 
> of a UDF correctly returns {{true}}. However, the same check against the same 
> null struct that has been persisted to a table incorrectly returns {{false}}. 
> Additionally, when a child field of the null struct is inspected in the same 
> query, the result of the null check on the struct itself reverses itself to 
> {{true}}.
> The issue does not appear to be dependent on the storage format of the table 
> as the same result is repeated with TEXTFILE, PARQUET, ORC and AVRO.
> h2. Example
> In this example I have used {{if(1=1, null, named_struct('c', 1))}} as a 
> crude method of generating a simple null struct.
> h4. 'b' is correctly reported as {{true}}.
> {code}
> hive> select
> >   b is null,
> >   b
> > from (
> >   select
> > if(1=1, null, named_struct('c', 1)) as b
> >   ) as a;
> OK
> true  NULL
> {code}
> h4. 'b' is correctly reported as {{true}} when also inspecting 'b.c'.
> {code}
> hive>
> > 

[jira] [Updated] (HIVE-15316) CTAS STORED AS AVRO: AvroTypeException Found default.record_0, expecting union

2016-11-30 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15316:
-
Description: 
There's an issue when querying a table that has been created as Avro via CTAS 
when the target struct is at least 2 struct-levels deep. It can be replicated 
with the following steps:

{code}
CREATE TABLE a
  STORED AS AVRO
  AS
SELECT named_struct('c', named_struct('d', 1)) as b;

SELECT b FROM a;

org.apache.avro.AvroTypeException: Found default.record_0, expecting union
{code}

The reason for this is that during table creation, the Avro schema is generated 
from the Hive columns in {{AvroSerDe}} and then passed through the Avro Schema 
Parser: {{new Schema.Parser().parse(schema.toString())}}. For the above 
example, this creates the below schema in the Avro file. Note that the lowest 
level struct, {{record_0}} has {{"namespace": "default"}}.

{code}
{
  "type": "record",
  "name": "a",
  "namespace": "default",
  "fields": [
{
  "name": "b",
  "type": [
"null",
{
  "type": "record",
  "name": "record_1",
  "namespace": "",
  "doc": "struct",
  "fields": [
{
  "name": "c",
  "type": [
"null",
{
  "type": "record",
  "name": "record_0",
  "namespace": "default",
  "doc": "struct",
  "fields": [
{
  "name": "d",
  "type": [ "null", "int" ],
  "doc": "int",
  "default": null
}
  ]
}
  ],
  "doc": "struct",
  "default": null
}
  ]
}
  ],
  "default": null
}
  ]
}
{code}

On a subsequent select query, the Avro schema is again generated from the Hive 
columns. However, this time it is not passed through the Avro Schema Parser and 
the {{namespace}} attribute is not present in {{record_0}}. The actual Error 
message _"Found default.record_0, expecting union"_ is slightly misleading. 
Although it is expecting a union, it is specifically expecting a null or a 
record named {{record_0}} but it finds {{default.record_0}}.

I believe this is a bug in Avro. I'm not sure whether the correct behaviour is 
to cascade the namespace down or not but it is definitely an inconsistency 
between creating a schema via the builders and parser. I've created 
[AVRO-1965|https://issues.apache.org/jira/browse/AVRO-1965] for this. However, 
I believe that defensively passing the schema through the Avro Schema Parser on 
a select query would fix this issue in Hive without an Avro fix and version 
bump in Hive.

  was:
There's an issue when querying a table that has been created as Avro via CTAS 
when the target struct is at least 2 struct-levels deep. It can be replicated 
with the following steps:

{code}
CREATE TABLE a
  STORED AS AVRO
  AS
SELECT named_struct('c', named_struct('d', 1)) as b;

SELECT b FROM a;

org.apache.avro.AvroTypeException: Found default.record_0, expecting union
{code}

The reason for this is that during table creation, the Avro schema is generated 
from the Hive columns in {{AvroSerDe}} and then passed through the Avro Schema 
Parser: {{new Schema.Parser().parse(schema.toString())}}. For the above 
example, this creates the below schema in the Avro file. Note that the lowest 
level struct, {{record_0}} has {{"namespace": "default"}}.

{code}
{
  "type": "record",
  "name": "a",
  "namespace": "default",
  "fields": [
{
  "name": "b",
  "type": [
"null",
{
  "type": "record",
  "name": "record_1",
  "namespace": "",
  "doc": "struct",
  "fields": [
{
  "name": "c",
  "type": [
"null",
{
  "type": "record",
  "name": "record_0",
  "namespace": "default",
  "doc": "struct",
  "fields": [
{
  "name": "d",
  "type": [ "null", "int" ],
  "doc": "int",
  "default": null
}
  ]
}
  ],
  "doc": "struct",
  "default": null
}
  ]
}
  ],
  "default": null
}
  ]
}
{code}

On a subsequent select query, the Avro schema is again generated from the Hive 
columns. However, this time it is not passed through the Avro Schema Parser and 
the {{namespace}} attribute is not present in {{record_0}}. The actual Error 
message _"Found default.record_0, expecting union"_ is slightly misleading. 
Although it is expecting a union, it is 

[jira] [Updated] (HIVE-15316) CTAS STORED AS AVRO: AvroTypeException Found default.record_0, expecting union

2016-11-30 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15316:
-
Description: 
There's an issue when querying a table that has been created as Avro via CTAS 
when the target struct is at least 2 struct-levels deep. It can be replicated 
with the following steps:

{code}
CREATE TABLE a
  STORED AS AVRO
  AS
SELECT named_struct('c', named_struct('d', 1)) as b;

SELECT b FROM a;

org.apache.avro.AvroTypeException: Found default.record_0, expecting union
{code}

The reason for this is that during table creation, the Avro schema is generated 
from the Hive columns in {{AvroSerDe}} and then passed through the Avro Schema 
Parser: {{new Schema.Parser().parse(schema.toString())}}. For the above 
example, this creates the below schema in the Avro file. Note that the lowest 
level struct, {{record_0}} has {{"namespace": "default"}}.

{code}
{
  "type": "record",
  "name": "a",
  "namespace": "default",
  "fields": [
{
  "name": "b",
  "type": [
"null",
{
  "type": "record",
  "name": "record_1",
  "namespace": "",
  "doc": "struct",
  "fields": [
{
  "name": "c",
  "type": [
"null",
{
  "type": "record",
  "name": "record_0",
  "namespace": "default",
  "doc": "struct",
  "fields": [
{
  "name": "d",
  "type": [ "null", "int" ],
  "doc": "int",
  "default": null
}
  ]
}
  ],
  "doc": "struct",
  "default": null
}
  ]
}
  ],
  "default": null
}
  ]
}
{code}

On a subsequent select query, the Avro schema is again generated from the Hive 
columns. However, this time it is not passed through the Avro Schema Parser and 
the {{namespace}} attribute is not present in {{record_0}}. The actual Error 
message _"Found default.record_0, expecting union"_ is slightly misleading. 
Although it is expecting a union, it is specifically expecting a null or a 
record named {{record_0}} but it finds {{default.record_0}}.

I believe this is a bug in Avro. I'm not sure whether correct behaviour is to 
cascade the namespace down or not but it is definitely an inconsistency between 
creating a schema via the builders and parser. I've created 
[AVRO-1965|https://issues.apache.org/jira/browse/AVRO-1965] for this. However, 
I believe that defensively passing the schema through the Avro Schema Parser on 
a select query would fix this issue in Hive without an Avro fix and version 
bump in Hive.

  was:
There's an issue when querying a table that has been created as Avro via CTAS 
when the target struct is at least 2 struct-levels deep. It can be replicated 
with the following steps:

{code}
CREATE TABLE a
  STORED AS AVRO
  AS
SELECT named_struct('c', named_struct('d', 1)) as b;

SELECT b FROM a;

org.apache.avro.AvroTypeException: Found default.record_0, expecting union
{code}

The reason for this is that during table creation, the Avro schema is generated 
from the Hive columns in {{AvroSerDe}} and then passed through the Avro Schema 
Parser: {{new Schema.Parser().parse(schema.toString())}}. For the above 
example, this creates the below schema in the Avro file. Note that the lowest 
level struct, {{record_0}} has {{"namespace": "default"}}.

{code}
{
  "type": "record",
  "name": "a",
  "namespace": "default",
  "fields": [
{
  "name": "b",
  "type": [
"null",
{
  "type": "record",
  "name": "record_1",
  "namespace": "",
  "doc": "struct",
  "fields": [
{
  "name": "c",
  "type": [
"null",
{
  "type": "record",
  "name": "record_0",
  "namespace": "default",
  "doc": "struct",
  "fields": [
{
  "name": "d",
  "type": [ "null", "int" ],
  "doc": "int",
  "default": null
}
  ]
}
  ],
  "doc": "struct",
  "default": null
}
  ]
}
  ],
  "default": null
}
  ]
}
{code}

On a subsequent select query, the Avro schema is again generated from the Hive 
columns. However, this time it is not passed through the Avro Schema Parser and 
the {{namespace}} attribute is not present in {{record_0}}. The actual Error 
message _"Found default.record_0, expecting union"_ is slightly misleading. 
Although it is a expecting a union, it is 

[jira] [Updated] (HIVE-15316) CTAS STORED AS AVRO: AvroTypeException Found default.record_0, expecting union

2016-11-30 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-15316:
-
Description: 
There's an issue when querying a table that has been created as Avro via CTAS 
when the target struct is at least 2 struct-levels deep. It can be replicated 
with the following steps:

{code}
CREATE TABLE a
  STORED AS AVRO
  AS
SELECT named_struct('c', named_struct('d', 1)) as b;

SELECT b FROM a;

org.apache.avro.AvroTypeException: Found default.record_0, expecting union
{code}

The reason for this is that during table creation, the Avro schema is generated 
from the Hive columns in {{AvroSerDe}} and then passed through the Avro Schema 
Parser: {{new Schema.Parser().parse(schema.toString())}}. For the above 
example, this creates the below schema in the Avro file. Note that the lowest 
level struct, {{record_0}} has {{"namespace": "default"}}.

{code}
{
  "type": "record",
  "name": "a",
  "namespace": "default",
  "fields": [
{
  "name": "b",
  "type": [
"null",
{
  "type": "record",
  "name": "record_1",
  "namespace": "",
  "doc": "struct",
  "fields": [
{
  "name": "c",
  "type": [
"null",
{
  "type": "record",
  "name": "record_0",
  "namespace": "default",
  "doc": "struct",
  "fields": [
{
  "name": "d",
  "type": [ "null", "int" ],
  "doc": "int",
  "default": null
}
  ]
}
  ],
  "doc": "struct",
  "default": null
}
  ]
}
  ],
  "default": null
}
  ]
}
{code}

On a subsequent select query, the Avro schema is again generated from the Hive 
columns. However, this time it is not passed through the Avro Schema Parser and 
the {{namespace}} attribute is not present in {{record_0}}. The actual Error 
message _"Found default.record_0, expecting union"_ is slightly misleading. 
Although it is a expecting a union, it is specifically expecting a null or a 
record named {{record_0}} but it finds {{default.record_0}}.

I believe this is a bug in Avro. I'm not sure whether correct behaviour is to 
cascade the namespace down or not but it is definitely an inconsistency between 
creating a schema via the builders and parser. I've created 
[AVRO-1965|https://issues.apache.org/jira/browse/AVRO-1965] for this. However, 
I believe that defensively passing the schema through the Avro Schema Parser on 
a select query would fix this issue in Hive without an Avro fix and version 
bump in Hive.

  was:
There's an issue when querying a table that has been created as Avro via CTAS 
when the target struct is at least 2 struct-levels deep. It can be replicated 
with the following steps:

{code}
CREATE TABLE a
  STORED AS AVRO
  AS
SELECT named_struct('c', named_struct('d', 1)) as b;

SELECT b FROM a;

org.apache.avro.AvroTypeException: Found default.record_0, expecting union
{code}

The reason for this is that during table creation, the Avro schema is generated 
from the Hive columns in {{AvroSerDe}} and then passed through the Avro Schema 
Parser: {{new Schema.Parser().parse(schema.toString())}}. For the above 
example, this creates the below schema in the Avro file. Note that the lowest 
level struct, {{record_0}} has {{"namespace": "default"}}.

{code}
{
  "type": "record",
  "name": "a",
  "namespace": "default",
  "fields": [
{
  "name": "b",
  "type": [
"null",
{
  "type": "record",
  "name": "record_1",
  "namespace": "",
  "doc": "struct",
  "fields": [
{
  "name": "c",
  "type": [
"null",
{
  "type": "record",
  "name": "record_0",
  "namespace": "default",
  "doc": "struct",
  "fields": [
{
  "name": "d",
  "type": [ "null", "int" ],
  "doc": "int",
  "default": null
}
  ]
}
  ],
  "doc": "struct",
  "default": null
}
  ]
}
  ],
  "default": null
}
  ]
}
{code}

On a subsequent select query, the Avro schema is again generated from the Hive 
columns. However, this time it is not passed through the Avro Schema Parser and 
the {{namespace}} attribute is not present in {{record_0}}. The actual Error 
message _"Found default.record_0, expecting union"_ is slightly misleading. 
Although it is a expected a union, it is 

[jira] [Commented] (HIVE-12158) Add methods to HCatClient for partition synchronization

2016-09-29 Thread David Maughan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533205#comment-15533205
 ] 

David Maughan commented on HIVE-12158:
--

Hi [~mithun], [~sushanth],

Apologies for the long delay. I've addressed the problem and attached a new 
patch.

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.1.patch, HIVE-12158.2.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12158) Add methods to HCatClient for partition synchronization

2016-09-29 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-12158:
-
Attachment: HIVE-12158.2.patch

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.1.patch, HIVE-12158.2.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12158) Add methods to HCatClient for partition synchronization

2016-02-10 Thread David Maughan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141031#comment-15141031
 ] 

David Maughan commented on HIVE-12158:
--

Hi [~sushanth], are you able to advise how to move this ticket along?

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.1.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12158) Add methods to HCatClient for partition synchronization

2015-10-28 Thread David Maughan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977997#comment-14977997
 ] 

David Maughan commented on HIVE-12158:
--

I don't think the failure is due to this patch

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.1.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12158) Add methods to HCatClient for partition synchronization

2015-10-27 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-12158:
-
Description: 
We have a use case where we have a list of partitions that are created as a 
result of a batch job (new or updated) outside of Hive and would like to 
synchronize them with the Hive MetaStore. We would like to use the HCatalog 
{{HCatClient}} but it currently does not seem to support this. However it is 
possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
following methods to {{HCatClient}} and {{HCatClientHMSImpl}}:

1. A method for altering partitions. The implementation would delegate to 
{{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of "alter" 
in the name so it's consistent with the {{HCatClient#updateTableSchema}} method.

{code}
public void updatePartitions(List partitions) throws 
HCatException
{code}

2. A method for altering or adding partitions depending on whether they already 
exist or not. The implementation would split the given list into a list of 
existing partitions (using {{HiveMetaStoreClient#getPartitionsByNames}} and 
{{Warehouse#makePartName}} to determine existence), and a list of new 
partitions. Then the appropriate add/update calls would be issued:

{code}
public void addOrUpdatePartitions(List partitions) throws 
HCatException
{code}

Are these acceptable? Are there any standards that should be followed here?

  was:
We have a use case where we have a list of partitions that are created as a 
result of a batch job (new or updated) outside of Hive and would like to 
synchronize them with the Hive MetaStore. We would like to use the HCatalog 
{{HCatClient}} but it currently does not seem to support this. However it is 
possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
following methods to {{HCatClient}} and {{HCatClientHMSImpl}}:

1. A method for altering partitions. The implementation would delegate to 
{{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of "alter" 
in the name so it's consistent with the {{HCatClient#updateTableSchema}} method.

{code}
public void updatePartitions(String dbName, String tableName, 
List partitions) throws HCatException
{code}

2. A method for altering or adding partitions depending on whether they already 
exist or not. The implementation would split the given list into a list of 
existing partitions (using {{HiveMetaStoreClient#getPartitionsByNames}} and 
{{Warehouse#makePartName}} to determine existence), and a list of new 
partitions. Then the appropriate add/update calls would be issued:

{code}
public void addOrUpdatePartitions(String dbName, String tableName, 
List partitions) throws HCatException
{code}

Are these acceptable? Are there any standards that should be followed here?


> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: David Maughan
>Priority: Minor
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following methods to {{HCatClient}} and {{HCatClientHMSImpl}}:
> 1. A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}
> 2. A method for altering or adding partitions depending on whether they 
> already exist or not. The implementation would split the given list into a 
> list of existing partitions (using 
> {{HiveMetaStoreClient#getPartitionsByNames}} and {{Warehouse#makePartName}} 
> to determine existence), and a list of new partitions. Then the appropriate 
> add/update calls would be issued:
> {code}
> public void addOrUpdatePartitions(List partitions) throws 
> HCatException
> {code}
> Are these acceptable? Are there any standards that should be followed here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12158) Add methods to HCatClient for partition synchronization

2015-10-27 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-12158:
-
Description: 
We have a use case where we have a list of partitions that are created as a 
result of a batch job (new or updated) outside of Hive and would like to 
synchronize them with the Hive MetaStore. We would like to use the HCatalog 
{{HCatClient}} but it currently does not seem to support this. However it is 
possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
following method to {{HCatClient}} and {{HCatClientHMSImpl}}:

A method for altering partitions. The implementation would delegate to 
{{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of "alter" 
in the name so it's consistent with the {{HCatClient#updateTableSchema}} method.

{code}
public void updatePartitions(List partitions) throws 
HCatException
{code}


  was:
We have a use case where we have a list of partitions that are created as a 
result of a batch job (new or updated) outside of Hive and would like to 
synchronize them with the Hive MetaStore. We would like to use the HCatalog 
{{HCatClient}} but it currently does not seem to support this. However it is 
possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
following methods to {{HCatClient}} and {{HCatClientHMSImpl}}:

1. A method for altering partitions. The implementation would delegate to 
{{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of "alter" 
in the name so it's consistent with the {{HCatClient#updateTableSchema}} method.

{code}
public void updatePartitions(List partitions) throws 
HCatException
{code}

2. A method for altering or adding partitions depending on whether they already 
exist or not. The implementation would split the given list into a list of 
existing partitions (using {{HiveMetaStoreClient#getPartitionsByNames}} and 
{{Warehouse#makePartName}} to determine existence), and a list of new 
partitions. Then the appropriate add/update calls would be issued:

{code}
public void addOrUpdatePartitions(List partitions) throws 
HCatException
{code}

Are these acceptable? Are there any standards that should be followed here?


> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: David Maughan
>Priority: Minor
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12158) Add methods to HCatClient for partition synchronization

2015-10-27 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan reassigned HIVE-12158:


Assignee: David Maughan

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.1.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12158) Add methods to HCatClient for partition synchronization

2015-10-27 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-12158:
-
Attachment: HIVE-12158.1.patch

Attached patch [^HIVE-12158.1.patch]

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: David Maughan
>Priority: Minor
> Attachments: HIVE-12158.1.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12158) Add methods to HCatClient for partition synchronization

2015-10-13 Thread David Maughan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Maughan updated HIVE-12158:
-
Description: 
We have a use case where we have a list of partitions that are created as a 
result of a batch job (new or updated) outside of Hive and would like to 
synchronize them with the Hive MetaStore. We would like to use the HCatalog 
{{HCatClient}} but it currently does not seem to support this. However it is 
possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
following methods to {{HCatClient}} and {{HCatClientHMSImpl}}:

1. A method for altering partitions. The implementation would delegate to 
{{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of "alter" 
in the name so it's consistent with the {{HCatClient#updateTableSchema}} method.

{code}
public void updatePartitions(String dbName, String tableName, 
List partitions) throws HCatException
{code}

2. A method for altering or adding partitions depending on whether they already 
exist or not. The implementation would split the given list into a list of 
existing partitions (using {{HiveMetaStoreClient#getPartitionsByNames}} and 
{{Warehouse#makePartName}} to determine existence), and a list of new 
partitions. Then the appropriate add/update calls would be issued:

{code}
public void addOrUpdatePartitions(String dbName, String tableName, 
List partitions) throws HCatException
{code}

Are these acceptable? Are there any standards that should be followed here?

  was:
We have a use case where we have a list of partitions that are created as a 
result of a batch job (new or updated) outside of Hive and would like to 
synchronize them with the Hive MetaStore. We would like to use the HCatalog 
{{HCatClient}} but it currently does not seem to support this. However it is 
possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
following methods to {{HCatClient}} and {{HCatClientHMSImpl}}:

1. A method for altering partitions. The implementation would delegate to 
{{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of "alter" 
in the name so it's consistent with the {{HCatClient#updateTableSchema}} method.

{code}
public void updatePartitions(List partitions) throws 
HCatException
{code}

2. A method for altering or adding partitions depending on whether they already 
exist or not. The implementation would split the given list into a list of 
existing partitions (using {{HiveMetaStoreClient#getPartitionsByNames}} and 
{{Warehouse#makePartName}} to determine existence), and a list of new 
partitions. Then the appropriate add/update calls would be issued:

{code}
public void addOrUpdatePartitions(List partitions) throws 
HCatException
{code}

Are these acceptable? Are there any standards that should be followed here?


> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: David Maughan
>Priority: Minor
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following methods to {{HCatClient}} and {{HCatClientHMSImpl}}:
> 1. A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(String dbName, String tableName, 
> List partitions) throws HCatException
> {code}
> 2. A method for altering or adding partitions depending on whether they 
> already exist or not. The implementation would split the given list into a 
> list of existing partitions (using 
> {{HiveMetaStoreClient#getPartitionsByNames}} and {{Warehouse#makePartName}} 
> to determine existence), and a list of new partitions. Then the appropriate 
> add/update calls would be issued:
> {code}
> public void addOrUpdatePartitions(String dbName, String tableName, 
> List partitions) throws HCatException
> {code}
> Are these acceptable? Are there any standards that should be followed here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)