Hive Contributor Request
Hi all, I'd like to be included as a Hive contributor. My Jira username is: justinleet Thanks, Justin
Review Request 25140: HIVE-7898 HCatStorer should ignore namespaces generated by Pig
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25140/ --- Review request for hive. Bugs: HIVE-7898 https://issues.apache.org/jira/browse/HIVE-7898 Repository: hive-git Description --- Namespaces Pig aliases should be ignored, and the original alias should be used for matching Pig fields to HCat columns Diffs - hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatBaseStorer.java ae60030 hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorer.java fcfc642 Diff: https://reviews.apache.org/r/25140/diff/ Testing --- Added unit tests for storing without namespaces. Thanks, Justin Leet
HIVE-21894 review?
I've had a PR out for awhile for SSL with the KafkaStorageHandler that isn't plaintext table properties, that's been waiting for both general review and specific feedback for a few questions (detailed on the PR itself). Would someone be able to help get this pushed across the finish line? https://issues.apache.org/jira/browse/HIVE-21894 https://github.com/apache/hive/pull/839
Re: HIVE-21894 review?
The stale filter on GitHub caught this, and I'm still looking for a review. Do I need to reopen a new PR, or can someone do that for me? For me, this is still a PR I'm willing to work for that provides value for the community, but I need feedback from contributors for. On Tue, Jun 2, 2020 at 3:31 PM Justin Leet wrote: > I've had a PR out for awhile for SSL with the KafkaStorageHandler that > isn't plaintext table properties, that's been waiting for both general > review and specific feedback for a few questions (detailed on the PR > itself). > > Would someone be able to help get this pushed across the finish line? > > https://issues.apache.org/jira/browse/HIVE-21894 > https://github.com/apache/hive/pull/839 >
[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
[ https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173733#comment-14173733 ] Justin Leet commented on HIVE-7898: --- Anybody willing to review this? https://reviews.apache.org/r/25140/ > HCatStorer should ignore namespaces generated by Pig > > > Key: HIVE-7898 > URL: https://issues.apache.org/jira/browse/HIVE-7898 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.1 >Reporter: Justin Leet >Assignee: Justin Leet >Priority: Minor > Attachments: HIVE-7898.1.patch > > > Currently, Pig aliases must exactly match the names of HCat columns for > HCatStorer to be successful. However, several Pig operations prepend a > namespace to the alias in order to differentiate fields (e.g. after a group > with field b, you might have A::b). In this case, even if the fields are in > the right order and the alias without namespace matches, the store will fail > because it tries to match the long form of the alias, despite the namespace > being extraneous information in this case. Note that multiple aliases can > be applied (e.g. A::B::C::d). > A workaround is possible by doing a > FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc. > This quickly becomes tedious and bloated for tables with many fields. > Changing this would normally require care around columns named, for example, > `A::b` as has been introduced in Hive 13. However, a different function call > only validates Pig aliases if they follow the old rules for Hive columns. As > such, a direct change (rather than attempting to match either the > namespace::alias or just alias) maintains compatibility for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
Justin Leet created HIVE-7898: - Summary: HCatStorer should ignore namespaces generated by Pig Key: HIVE-7898 URL: https://issues.apache.org/jira/browse/HIVE-7898 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.13.1 Reporter: Justin Leet Assignee: Justin Leet Priority: Minor Currently, Pig aliases must exactly match the names of HCat columns for HCatStorer to be successful. However, several Pig operations prepend a namespace to the alias in order to differentiate fields (e.g. after a group with field b, you might have A::b). In this case, even if the fields are in the right order and the alias without namespace matches, the store will fail because it tries to match the long form of the alias, despite the namespace being extraneous information in this case. Note that multiple aliases can be applied (e.g. A::B::C::d). A workaround is possible by doing a FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc. This quickly becomes tedious and bloated for tables with many fields. Changing this would normally require care around columns named, for example, `A::b` as has been introduced in Hive 13. However, a different function call only validates Pig aliases if they follow the old rules for Hive columns. As such, a direct change (rather than attempting to match either the namespace::alias or just alias) maintains compatibility for now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
[ https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Leet updated HIVE-7898: -- Attachment: HIVE-7898.1.patch > HCatStorer should ignore namespaces generated by Pig > > > Key: HIVE-7898 > URL: https://issues.apache.org/jira/browse/HIVE-7898 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.1 > Reporter: Justin Leet >Assignee: Justin Leet >Priority: Minor > Attachments: HIVE-7898.1.patch > > > Currently, Pig aliases must exactly match the names of HCat columns for > HCatStorer to be successful. However, several Pig operations prepend a > namespace to the alias in order to differentiate fields (e.g. after a group > with field b, you might have A::b). In this case, even if the fields are in > the right order and the alias without namespace matches, the store will fail > because it tries to match the long form of the alias, despite the namespace > being extraneous information in this case. Note that multiple aliases can > be applied (e.g. A::B::C::d). > A workaround is possible by doing a > FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc. > This quickly becomes tedious and bloated for tables with many fields. > Changing this would normally require care around columns named, for example, > `A::b` as has been introduced in Hive 13. However, a different function call > only validates Pig aliases if they follow the old rules for Hive columns. As > such, a direct change (rather than attempting to match either the > namespace::alias or just alias) maintains compatibility for now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
[ https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Leet updated HIVE-7898: -- Status: Patch Available (was: Open) Stripped namespace from alias throughout HCatBaseStorer. Added unit tests for storing after performing an operation that gives aliases a namespace. > HCatStorer should ignore namespaces generated by Pig > > > Key: HIVE-7898 > URL: https://issues.apache.org/jira/browse/HIVE-7898 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.1 > Reporter: Justin Leet >Assignee: Justin Leet >Priority: Minor > Attachments: HIVE-7898.1.patch > > > Currently, Pig aliases must exactly match the names of HCat columns for > HCatStorer to be successful. However, several Pig operations prepend a > namespace to the alias in order to differentiate fields (e.g. after a group > with field b, you might have A::b). In this case, even if the fields are in > the right order and the alias without namespace matches, the store will fail > because it tries to match the long form of the alias, despite the namespace > being extraneous information in this case. Note that multiple aliases can > be applied (e.g. A::B::C::d). > A workaround is possible by doing a > FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc. > This quickly becomes tedious and bloated for tables with many fields. > Changing this would normally require care around columns named, for example, > `A::b` as has been introduced in Hive 13. However, a different function call > only validates Pig aliases if they follow the old rules for Hive columns. As > such, a direct change (rather than attempting to match either the > namespace::alias or just alias) maintains compatibility for now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
[ https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113716#comment-14113716 ] Justin Leet commented on HIVE-7898: --- These two tests both appear to fail locally with or without my changes. They're also both well outside what hcatalog-pig-adapter, and in specific HCatBaseStorer, would be involved in. Both running query files, and not Pig scripts that use the storer. > HCatStorer should ignore namespaces generated by Pig > > > Key: HIVE-7898 > URL: https://issues.apache.org/jira/browse/HIVE-7898 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.1 >Reporter: Justin Leet >Assignee: Justin Leet >Priority: Minor > Attachments: HIVE-7898.1.patch > > > Currently, Pig aliases must exactly match the names of HCat columns for > HCatStorer to be successful. However, several Pig operations prepend a > namespace to the alias in order to differentiate fields (e.g. after a group > with field b, you might have A::b). In this case, even if the fields are in > the right order and the alias without namespace matches, the store will fail > because it tries to match the long form of the alias, despite the namespace > being extraneous information in this case. Note that multiple aliases can > be applied (e.g. A::B::C::d). > A workaround is possible by doing a > FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc. > This quickly becomes tedious and bloated for tables with many fields. > Changing this would normally require care around columns named, for example, > `A::b` as has been introduced in Hive 13. However, a different function call > only validates Pig aliases if they follow the old rules for Hive columns. As > such, a direct change (rather than attempting to match either the > namespace::alias or just alias) maintains compatibility for now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-21861) ClassCastException during CTAS over external table using KafkaStorageHandler
Justin Leet created HIVE-21861: -- Summary: ClassCastException during CTAS over external table using KafkaStorageHandler Key: HIVE-21861 URL: https://issues.apache.org/jira/browse/HIVE-21861 Project: Hive Issue Type: Bug Components: kafka integration Affects Versions: 0.3.0 Reporter: Justin Leet To reproduce, create a table similar to the following: {code} CREATE EXTERNAL TABLE (raw_value STRING) ROW FORMAT DELIMITED LINES TERMINATED BY '\n' STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' TBLPROPERTIES( "kafka.topic"="", "kafka.bootstrap.servers"="", "kafka.consumer.security.protocol"="PLAINTEXT", "kafka.serde.class"="org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"); {code} Note the SerDe isn't the default SerDe. Additionally, this error occurs when vectorization is enabled. Basic queries work fine: {code} SELECT * FROM LIMIT 1; {code} Doing a CTAS to bring it into a managed table fails: {code} CREATE TABLE AS SELECT * FROM ; {code} The exception is: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to org.apache.hadoop.io.TextCaused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:471) at org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350) at org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.readNextBatch(VectorizedKafkaRecordReader.java:159) at org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.next(VectorizedKafkaRecordReader.java:113) at org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.next(VectorizedKafkaRecordReader.java:47) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) ... 24 more {code} A workaround to this is to disable vectorization via: {code} set hive.vectorized.execution.enabled = false; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
[ https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261275#comment-14261275 ] Justin Leet commented on HIVE-7898: --- This actually already happens in my patch. HCatStorer will abort with an error: e.g. "Field named already exists". This isn't specifically in HCatBaseStorer, it actually occurs during the conversion from Pig Schema to HCatSchema in convertPigSchemaToHCatSchema(). The modified getColFromSchema will pass the now truncated name, so convertPigSchemaToHCatSchema() will attempt to add the now duplicated column and HCat won't allow the duplicated field to go through. > HCatStorer should ignore namespaces generated by Pig > > > Key: HIVE-7898 > URL: https://issues.apache.org/jira/browse/HIVE-7898 > Project: Hive > Issue Type: Improvement > Components: HCatalog > Affects Versions: 0.13.1 > Reporter: Justin Leet >Assignee: Justin Leet >Priority: Minor > Attachments: HIVE-7898.1.patch > > > Currently, Pig aliases must exactly match the names of HCat columns for > HCatStorer to be successful. However, several Pig operations prepend a > namespace to the alias in order to differentiate fields (e.g. after a group > with field b, you might have A::b). In this case, even if the fields are in > the right order and the alias without namespace matches, the store will fail > because it tries to match the long form of the alias, despite the namespace > being extraneous information in this case. Note that multiple aliases can > be applied (e.g. A::B::C::d). > A workaround is possible by doing a > FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc. > This quickly becomes tedious and bloated for tables with many fields. > Changing this would normally require care around columns named, for example, > `A::b` as has been introduced in Hive 13. However, a different function call > only validates Pig aliases if they follow the old rules for Hive columns. As > such, a direct change (rather than attempting to match either the > namespace::alias or just alias) maintains compatibility for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
[ https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261274#comment-14261274 ] Justin Leet commented on HIVE-7898: --- This actually already happens in my patch. HCatStorer will abort with an error: e.g. "Field named already exists". This isn't specifically in HCatBaseStorer, it actually occurs during the conversion from Pig Schema to HCatSchema in convertPigSchemaToHCatSchema(). The modified getColFromSchema will pass the now truncated name, so convertPigSchemaToHCatSchema() will attempt to add the now duplicated column and HCat won't allow the duplicated field to go through. > HCatStorer should ignore namespaces generated by Pig > > > Key: HIVE-7898 > URL: https://issues.apache.org/jira/browse/HIVE-7898 > Project: Hive > Issue Type: Improvement > Components: HCatalog > Affects Versions: 0.13.1 > Reporter: Justin Leet >Assignee: Justin Leet >Priority: Minor > Attachments: HIVE-7898.1.patch > > > Currently, Pig aliases must exactly match the names of HCat columns for > HCatStorer to be successful. However, several Pig operations prepend a > namespace to the alias in order to differentiate fields (e.g. after a group > with field b, you might have A::b). In this case, even if the fields are in > the right order and the alias without namespace matches, the store will fail > because it tries to match the long form of the alias, despite the namespace > being extraneous information in this case. Note that multiple aliases can > be applied (e.g. A::B::C::d). > A workaround is possible by doing a > FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc. > This quickly becomes tedious and bloated for tables with many fields. > Changing this would normally require care around columns named, for example, > `A::b` as has been introduced in Hive 13. However, a different function call > only validates Pig aliases if they follow the old rules for Hive columns. As > such, a direct change (rather than attempting to match either the > namespace::alias or just alias) maintains compatibility for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig
[ https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Leet updated HIVE-7898: -- Status: In Progress (was: Patch Available) > HCatStorer should ignore namespaces generated by Pig > > > Key: HIVE-7898 > URL: https://issues.apache.org/jira/browse/HIVE-7898 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.1 > Reporter: Justin Leet >Assignee: Justin Leet >Priority: Minor > Attachments: HIVE-7898.1.patch > > > Currently, Pig aliases must exactly match the names of HCat columns for > HCatStorer to be successful. However, several Pig operations prepend a > namespace to the alias in order to differentiate fields (e.g. after a group > with field b, you might have A::b). In this case, even if the fields are in > the right order and the alias without namespace matches, the store will fail > because it tries to match the long form of the alias, despite the namespace > being extraneous information in this case. Note that multiple aliases can > be applied (e.g. A::B::C::d). > A workaround is possible by doing a > FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc. > This quickly becomes tedious and bloated for tables with many fields. > Changing this would normally require care around columns named, for example, > `A::b` as has been introduced in Hive 13. However, a different function call > only validates Pig aliases if they follow the old rules for Hive columns. As > such, a direct change (rather than attempting to match either the > namespace::alias or just alias) maintains compatibility for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)