[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2015-01-05 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265056#comment-14265056
 ] 

Daniel Dai commented on HIVE-7898:
--

You are right. HCatSchema construct will check for duplicate columns.

A separate issue, the compilation of TestHCatStorer fail with the patch. Can 
you check it?

> HCatStorer should ignore namespaces generated by Pig
> 
>
> Key: HIVE-7898
> URL: https://issues.apache.org/jira/browse/HIVE-7898
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Justin Leet
>Assignee: Justin Leet
>Priority: Minor
> Attachments: HIVE-7898.1.patch
>
>
> Currently, Pig aliases must exactly match the names of HCat columns for 
> HCatStorer to be successful.  However, several Pig operations prepend a 
> namespace to the alias in order to differentiate fields (e.g. after a group 
> with field b, you might have A::b).  In this case, even if the fields are in 
> the right order and the alias without namespace matches, the store will fail 
> because it tries to match the long form of the alias, despite the namespace 
> being extraneous information in this case.   Note that multiple aliases can 
> be applied (e.g. A::B::C::d).
> A workaround is possible by doing a 
> FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
> This quickly becomes tedious and bloated for tables with many fields.
> Changing this would normally require care around columns named, for example, 
> `A::b` as has been introduced in Hive 13.  However, a different function call 
> only validates Pig aliases if they follow the old rules for Hive columns.  As 
> such, a direct change (rather than attempting to match either the 
> namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-12-30 Thread Justin Leet (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261274#comment-14261274
 ] 

Justin Leet commented on HIVE-7898:
---

This actually already happens in my patch. HCatStorer will abort with an error: 
e.g. "Field named  already exists".  This isn't specifically in 
HCatBaseStorer, it actually occurs during the conversion from Pig Schema to 
HCatSchema in convertPigSchemaToHCatSchema(). The modified getColFromSchema 
will pass the now truncated name, so convertPigSchemaToHCatSchema() will 
attempt to add the now duplicated column and HCat won't allow the duplicated 
field to go through.

> HCatStorer should ignore namespaces generated by Pig
> 
>
> Key: HIVE-7898
> URL: https://issues.apache.org/jira/browse/HIVE-7898
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Justin Leet
>Assignee: Justin Leet
>Priority: Minor
> Attachments: HIVE-7898.1.patch
>
>
> Currently, Pig aliases must exactly match the names of HCat columns for 
> HCatStorer to be successful.  However, several Pig operations prepend a 
> namespace to the alias in order to differentiate fields (e.g. after a group 
> with field b, you might have A::b).  In this case, even if the fields are in 
> the right order and the alias without namespace matches, the store will fail 
> because it tries to match the long form of the alias, despite the namespace 
> being extraneous information in this case.   Note that multiple aliases can 
> be applied (e.g. A::B::C::d).
> A workaround is possible by doing a 
> FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
> This quickly becomes tedious and bloated for tables with many fields.
> Changing this would normally require care around columns named, for example, 
> `A::b` as has been introduced in Hive 13.  However, a different function call 
> only validates Pig aliases if they follow the old rules for Hive columns.  As 
> such, a direct change (rather than attempting to match either the 
> namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-12-30 Thread Justin Leet (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261275#comment-14261275
 ] 

Justin Leet commented on HIVE-7898:
---

This actually already happens in my patch. HCatStorer will abort with an error: 
e.g. "Field named  already exists".  This isn't specifically in 
HCatBaseStorer, it actually occurs during the conversion from Pig Schema to 
HCatSchema in convertPigSchemaToHCatSchema(). The modified getColFromSchema 
will pass the now truncated name, so convertPigSchemaToHCatSchema() will 
attempt to add the now duplicated column and HCat won't allow the duplicated 
field to go through.

> HCatStorer should ignore namespaces generated by Pig
> 
>
> Key: HIVE-7898
> URL: https://issues.apache.org/jira/browse/HIVE-7898
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Justin Leet
>Assignee: Justin Leet
>Priority: Minor
> Attachments: HIVE-7898.1.patch
>
>
> Currently, Pig aliases must exactly match the names of HCat columns for 
> HCatStorer to be successful.  However, several Pig operations prepend a 
> namespace to the alias in order to differentiate fields (e.g. after a group 
> with field b, you might have A::b).  In this case, even if the fields are in 
> the right order and the alias without namespace matches, the store will fail 
> because it tries to match the long form of the alias, despite the namespace 
> being extraneous information in this case.   Note that multiple aliases can 
> be applied (e.g. A::B::C::d).
> A workaround is possible by doing a 
> FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
> This quickly becomes tedious and bloated for tables with many fields.
> Changing this would normally require care around columns named, for example, 
> `A::b` as has been introduced in Hive 13.  However, a different function call 
> only validates Pig aliases if they follow the old rules for Hive columns.  As 
> such, a direct change (rather than attempting to match either the 
> namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-12-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257389#comment-14257389
 ] 

Daniel Dai commented on HIVE-7898:
--

In general, I am fine with the idea. But after removing namespace, it is 
possible to get duplicate column name. We shall check this scenario in the 
patch, and print a warning if happens.

> HCatStorer should ignore namespaces generated by Pig
> 
>
> Key: HIVE-7898
> URL: https://issues.apache.org/jira/browse/HIVE-7898
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Justin Leet
>Assignee: Justin Leet
>Priority: Minor
> Attachments: HIVE-7898.1.patch
>
>
> Currently, Pig aliases must exactly match the names of HCat columns for 
> HCatStorer to be successful.  However, several Pig operations prepend a 
> namespace to the alias in order to differentiate fields (e.g. after a group 
> with field b, you might have A::b).  In this case, even if the fields are in 
> the right order and the alias without namespace matches, the store will fail 
> because it tries to match the long form of the alias, despite the namespace 
> being extraneous information in this case.   Note that multiple aliases can 
> be applied (e.g. A::B::C::d).
> A workaround is possible by doing a 
> FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
> This quickly becomes tedious and bloated for tables with many fields.
> Changing this would normally require care around columns named, for example, 
> `A::b` as has been introduced in Hive 13.  However, a different function call 
> only validates Pig aliases if they follow the old rules for Hive columns.  As 
> such, a direct change (rather than attempting to match either the 
> namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-10-16 Thread Justin Leet (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173733#comment-14173733
 ] 

Justin Leet commented on HIVE-7898:
---

Anybody willing to review this? https://reviews.apache.org/r/25140/


> HCatStorer should ignore namespaces generated by Pig
> 
>
> Key: HIVE-7898
> URL: https://issues.apache.org/jira/browse/HIVE-7898
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Justin Leet
>Assignee: Justin Leet
>Priority: Minor
> Attachments: HIVE-7898.1.patch
>
>
> Currently, Pig aliases must exactly match the names of HCat columns for 
> HCatStorer to be successful.  However, several Pig operations prepend a 
> namespace to the alias in order to differentiate fields (e.g. after a group 
> with field b, you might have A::b).  In this case, even if the fields are in 
> the right order and the alias without namespace matches, the store will fail 
> because it tries to match the long form of the alias, despite the namespace 
> being extraneous information in this case.   Note that multiple aliases can 
> be applied (e.g. A::B::C::d).
> A workaround is possible by doing a 
> FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
> This quickly becomes tedious and bloated for tables with many fields.
> Changing this would normally require care around columns named, for example, 
> `A::b` as has been introduced in Hive 13.  However, a different function call 
> only validates Pig aliases if they follow the old rules for Hive columns.  As 
> such, a direct change (rather than attempting to match either the 
> namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-08-28 Thread Justin Leet (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113716#comment-14113716
 ] 

Justin Leet commented on HIVE-7898:
---

These two tests both appear to fail locally with or without my changes.  
They're also both well outside what hcatalog-pig-adapter, and in specific 
HCatBaseStorer, would be involved in. Both running query files, and not Pig 
scripts that use the storer.

> HCatStorer should ignore namespaces generated by Pig
> 
>
> Key: HIVE-7898
> URL: https://issues.apache.org/jira/browse/HIVE-7898
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Justin Leet
>Assignee: Justin Leet
>Priority: Minor
> Attachments: HIVE-7898.1.patch
>
>
> Currently, Pig aliases must exactly match the names of HCat columns for 
> HCatStorer to be successful.  However, several Pig operations prepend a 
> namespace to the alias in order to differentiate fields (e.g. after a group 
> with field b, you might have A::b).  In this case, even if the fields are in 
> the right order and the alias without namespace matches, the store will fail 
> because it tries to match the long form of the alias, despite the namespace 
> being extraneous information in this case.   Note that multiple aliases can 
> be applied (e.g. A::B::C::d).
> A workaround is possible by doing a 
> FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
> This quickly becomes tedious and bloated for tables with many fields.
> Changing this would normally require care around columns named, for example, 
> `A::b` as has been introduced in Hive 13.  However, a different function call 
> only validates Pig aliases if they follow the old rules for Hive columns.  As 
> such, a direct change (rather than attempting to match either the 
> namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-08-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113576#comment-14113576
 ] 

Hive QA commented on HIVE-7898:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664820/HIVE-7898.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6132 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/541/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/541/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-541/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664820

> HCatStorer should ignore namespaces generated by Pig
> 
>
> Key: HIVE-7898
> URL: https://issues.apache.org/jira/browse/HIVE-7898
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Justin Leet
>Assignee: Justin Leet
>Priority: Minor
> Attachments: HIVE-7898.1.patch
>
>
> Currently, Pig aliases must exactly match the names of HCat columns for 
> HCatStorer to be successful.  However, several Pig operations prepend a 
> namespace to the alias in order to differentiate fields (e.g. after a group 
> with field b, you might have A::b).  In this case, even if the fields are in 
> the right order and the alias without namespace matches, the store will fail 
> because it tries to match the long form of the alias, despite the namespace 
> being extraneous information in this case.   Note that multiple aliases can 
> be applied (e.g. A::B::C::d).
> A workaround is possible by doing a 
> FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
> This quickly becomes tedious and bloated for tables with many fields.
> Changing this would normally require care around columns named, for example, 
> `A::b` as has been introduced in Hive 13.  However, a different function call 
> only validates Pig aliases if they follow the old rules for Hive columns.  As 
> such, a direct change (rather than attempting to match either the 
> namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)