Re: HIVE-21894 review?

2020-06-06 Thread Justin Leet
The stale filter on GitHub caught this, and I'm still looking for a review.
Do I need to reopen a new PR, or can someone do that for me? For me, this
is still a PR I'm willing to work for that provides value for the
community, but I need feedback from contributors for.

On Tue, Jun 2, 2020 at 3:31 PM Justin Leet  wrote:

> I've had a PR out for awhile for SSL with the KafkaStorageHandler that
> isn't plaintext table properties, that's been waiting for both general
> review and specific feedback for a few questions (detailed on the PR
> itself).
>
> Would someone be able to help get this pushed across the finish line?
>
> https://issues.apache.org/jira/browse/HIVE-21894
> https://github.com/apache/hive/pull/839
>


HIVE-21894 review?

2020-06-02 Thread Justin Leet
I've had a PR out for awhile for SSL with the KafkaStorageHandler that
isn't plaintext table properties, that's been waiting for both general
review and specific feedback for a few questions (detailed on the PR
itself).

Would someone be able to help get this pushed across the finish line?

https://issues.apache.org/jira/browse/HIVE-21894
https://github.com/apache/hive/pull/839


[jira] [Created] (HIVE-21861) ClassCastException during CTAS over external table using KafkaStorageHandler

2019-06-11 Thread Justin Leet (JIRA)
Justin Leet created HIVE-21861:
--

 Summary: ClassCastException during CTAS over external table using 
KafkaStorageHandler
 Key: HIVE-21861
 URL: https://issues.apache.org/jira/browse/HIVE-21861
 Project: Hive
  Issue Type: Bug
  Components: kafka integration
Affects Versions: 0.3.0
Reporter: Justin Leet


To reproduce, create a table similar to the following:
{code}
 CREATE EXTERNAL TABLE 
 (raw_value STRING)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n'
STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
TBLPROPERTIES(
 "kafka.topic"="",
 "kafka.bootstrap.servers"="",
 "kafka.consumer.security.protocol"="PLAINTEXT",
 "kafka.serde.class"="org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe");
{code}

Note the SerDe isn't the default SerDe.  Additionally, this error occurs when 
vectorization is enabled.

Basic queries work fine:
{code}
SELECT * FROM  LIMIT 1;
{code}

Doing a CTAS to bring it into a managed table fails:
{code}
CREATE TABLE  AS
SELECT * FROM ;
{code}

The exception is: 
{code}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to 
org.apache.hadoop.io.TextCaused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to 
org.apache.hadoop.io.Text at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:471)
 at 
org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350)
 at 
org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.readNextBatch(VectorizedKafkaRecordReader.java:159)
 at 
org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.next(VectorizedKafkaRecordReader.java:113)
 at 
org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.next(VectorizedKafkaRecordReader.java:47)
 at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
 ... 24 more
{code}

A workaround to this is to disable vectorization via: 
{code}
set hive.vectorized.execution.enabled = false;
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2015-01-05 Thread Justin Leet (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Leet updated HIVE-7898:
--
Status: In Progress  (was: Patch Available)

 HCatStorer should ignore namespaces generated by Pig
 

 Key: HIVE-7898
 URL: https://issues.apache.org/jira/browse/HIVE-7898
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Justin Leet
Assignee: Justin Leet
Priority: Minor
 Attachments: HIVE-7898.1.patch


 Currently, Pig aliases must exactly match the names of HCat columns for 
 HCatStorer to be successful.  However, several Pig operations prepend a 
 namespace to the alias in order to differentiate fields (e.g. after a group 
 with field b, you might have A::b).  In this case, even if the fields are in 
 the right order and the alias without namespace matches, the store will fail 
 because it tries to match the long form of the alias, despite the namespace 
 being extraneous information in this case.   Note that multiple aliases can 
 be applied (e.g. A::B::C::d).
 A workaround is possible by doing a 
 FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
 This quickly becomes tedious and bloated for tables with many fields.
 Changing this would normally require care around columns named, for example, 
 `A::b` as has been introduced in Hive 13.  However, a different function call 
 only validates Pig aliases if they follow the old rules for Hive columns.  As 
 such, a direct change (rather than attempting to match either the 
 namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-12-30 Thread Justin Leet (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261275#comment-14261275
 ] 

Justin Leet commented on HIVE-7898:
---

This actually already happens in my patch. HCatStorer will abort with an error: 
e.g. Field named field already exists.  This isn't specifically in 
HCatBaseStorer, it actually occurs during the conversion from Pig Schema to 
HCatSchema in convertPigSchemaToHCatSchema(). The modified getColFromSchema 
will pass the now truncated name, so convertPigSchemaToHCatSchema() will 
attempt to add the now duplicated column and HCat won't allow the duplicated 
field to go through.

 HCatStorer should ignore namespaces generated by Pig
 

 Key: HIVE-7898
 URL: https://issues.apache.org/jira/browse/HIVE-7898
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Justin Leet
Assignee: Justin Leet
Priority: Minor
 Attachments: HIVE-7898.1.patch


 Currently, Pig aliases must exactly match the names of HCat columns for 
 HCatStorer to be successful.  However, several Pig operations prepend a 
 namespace to the alias in order to differentiate fields (e.g. after a group 
 with field b, you might have A::b).  In this case, even if the fields are in 
 the right order and the alias without namespace matches, the store will fail 
 because it tries to match the long form of the alias, despite the namespace 
 being extraneous information in this case.   Note that multiple aliases can 
 be applied (e.g. A::B::C::d).
 A workaround is possible by doing a 
 FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
 This quickly becomes tedious and bloated for tables with many fields.
 Changing this would normally require care around columns named, for example, 
 `A::b` as has been introduced in Hive 13.  However, a different function call 
 only validates Pig aliases if they follow the old rules for Hive columns.  As 
 such, a direct change (rather than attempting to match either the 
 namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-12-30 Thread Justin Leet (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261274#comment-14261274
 ] 

Justin Leet commented on HIVE-7898:
---

This actually already happens in my patch. HCatStorer will abort with an error: 
e.g. Field named field already exists.  This isn't specifically in 
HCatBaseStorer, it actually occurs during the conversion from Pig Schema to 
HCatSchema in convertPigSchemaToHCatSchema(). The modified getColFromSchema 
will pass the now truncated name, so convertPigSchemaToHCatSchema() will 
attempt to add the now duplicated column and HCat won't allow the duplicated 
field to go through.

 HCatStorer should ignore namespaces generated by Pig
 

 Key: HIVE-7898
 URL: https://issues.apache.org/jira/browse/HIVE-7898
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Justin Leet
Assignee: Justin Leet
Priority: Minor
 Attachments: HIVE-7898.1.patch


 Currently, Pig aliases must exactly match the names of HCat columns for 
 HCatStorer to be successful.  However, several Pig operations prepend a 
 namespace to the alias in order to differentiate fields (e.g. after a group 
 with field b, you might have A::b).  In this case, even if the fields are in 
 the right order and the alias without namespace matches, the store will fail 
 because it tries to match the long form of the alias, despite the namespace 
 being extraneous information in this case.   Note that multiple aliases can 
 be applied (e.g. A::B::C::d).
 A workaround is possible by doing a 
 FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
 This quickly becomes tedious and bloated for tables with many fields.
 Changing this would normally require care around columns named, for example, 
 `A::b` as has been introduced in Hive 13.  However, a different function call 
 only validates Pig aliases if they follow the old rules for Hive columns.  As 
 such, a direct change (rather than attempting to match either the 
 namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-10-16 Thread Justin Leet (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173733#comment-14173733
 ] 

Justin Leet commented on HIVE-7898:
---

Anybody willing to review this? https://reviews.apache.org/r/25140/


 HCatStorer should ignore namespaces generated by Pig
 

 Key: HIVE-7898
 URL: https://issues.apache.org/jira/browse/HIVE-7898
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Justin Leet
Assignee: Justin Leet
Priority: Minor
 Attachments: HIVE-7898.1.patch


 Currently, Pig aliases must exactly match the names of HCat columns for 
 HCatStorer to be successful.  However, several Pig operations prepend a 
 namespace to the alias in order to differentiate fields (e.g. after a group 
 with field b, you might have A::b).  In this case, even if the fields are in 
 the right order and the alias without namespace matches, the store will fail 
 because it tries to match the long form of the alias, despite the namespace 
 being extraneous information in this case.   Note that multiple aliases can 
 be applied (e.g. A::B::C::d).
 A workaround is possible by doing a 
 FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
 This quickly becomes tedious and bloated for tables with many fields.
 Changing this would normally require care around columns named, for example, 
 `A::b` as has been introduced in Hive 13.  However, a different function call 
 only validates Pig aliases if they follow the old rules for Hive columns.  As 
 such, a direct change (rather than attempting to match either the 
 namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-08-28 Thread Justin Leet (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113716#comment-14113716
 ] 

Justin Leet commented on HIVE-7898:
---

These two tests both appear to fail locally with or without my changes.  
They're also both well outside what hcatalog-pig-adapter, and in specific 
HCatBaseStorer, would be involved in. Both running query files, and not Pig 
scripts that use the storer.

 HCatStorer should ignore namespaces generated by Pig
 

 Key: HIVE-7898
 URL: https://issues.apache.org/jira/browse/HIVE-7898
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Justin Leet
Assignee: Justin Leet
Priority: Minor
 Attachments: HIVE-7898.1.patch


 Currently, Pig aliases must exactly match the names of HCat columns for 
 HCatStorer to be successful.  However, several Pig operations prepend a 
 namespace to the alias in order to differentiate fields (e.g. after a group 
 with field b, you might have A::b).  In this case, even if the fields are in 
 the right order and the alias without namespace matches, the store will fail 
 because it tries to match the long form of the alias, despite the namespace 
 being extraneous information in this case.   Note that multiple aliases can 
 be applied (e.g. A::B::C::d).
 A workaround is possible by doing a 
 FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
 This quickly becomes tedious and bloated for tables with many fields.
 Changing this would normally require care around columns named, for example, 
 `A::b` as has been introduced in Hive 13.  However, a different function call 
 only validates Pig aliases if they follow the old rules for Hive columns.  As 
 such, a direct change (rather than attempting to match either the 
 namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-08-27 Thread Justin Leet (JIRA)
Justin Leet created HIVE-7898:
-

 Summary: HCatStorer should ignore namespaces generated by Pig
 Key: HIVE-7898
 URL: https://issues.apache.org/jira/browse/HIVE-7898
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Justin Leet
Assignee: Justin Leet
Priority: Minor


Currently, Pig aliases must exactly match the names of HCat columns for 
HCatStorer to be successful.  However, several Pig operations prepend a 
namespace to the alias in order to differentiate fields (e.g. after a group 
with field b, you might have A::b).  In this case, even if the fields are in 
the right order and the alias without namespace matches, the store will fail 
because it tries to match the long form of the alias, despite the namespace 
being extraneous information in this case.   Note that multiple aliases can be 
applied (e.g. A::B::C::d).

A workaround is possible by doing a 
FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
This quickly becomes tedious and bloated for tables with many fields.

Changing this would normally require care around columns named, for example, 
`A::b` as has been introduced in Hive 13.  However, a different function call 
only validates Pig aliases if they follow the old rules for Hive columns.  As 
such, a direct change (rather than attempting to match either the 
namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-08-27 Thread Justin Leet (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Leet updated HIVE-7898:
--

Attachment: HIVE-7898.1.patch

 HCatStorer should ignore namespaces generated by Pig
 

 Key: HIVE-7898
 URL: https://issues.apache.org/jira/browse/HIVE-7898
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Justin Leet
Assignee: Justin Leet
Priority: Minor
 Attachments: HIVE-7898.1.patch


 Currently, Pig aliases must exactly match the names of HCat columns for 
 HCatStorer to be successful.  However, several Pig operations prepend a 
 namespace to the alias in order to differentiate fields (e.g. after a group 
 with field b, you might have A::b).  In this case, even if the fields are in 
 the right order and the alias without namespace matches, the store will fail 
 because it tries to match the long form of the alias, despite the namespace 
 being extraneous information in this case.   Note that multiple aliases can 
 be applied (e.g. A::B::C::d).
 A workaround is possible by doing a 
 FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
 This quickly becomes tedious and bloated for tables with many fields.
 Changing this would normally require care around columns named, for example, 
 `A::b` as has been introduced in Hive 13.  However, a different function call 
 only validates Pig aliases if they follow the old rules for Hive columns.  As 
 such, a direct change (rather than attempting to match either the 
 namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7898) HCatStorer should ignore namespaces generated by Pig

2014-08-27 Thread Justin Leet (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Leet updated HIVE-7898:
--

Status: Patch Available  (was: Open)

Stripped namespace from alias throughout HCatBaseStorer.  Added unit tests for 
storing after performing an operation that gives aliases a namespace.

 HCatStorer should ignore namespaces generated by Pig
 

 Key: HIVE-7898
 URL: https://issues.apache.org/jira/browse/HIVE-7898
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Justin Leet
Assignee: Justin Leet
Priority: Minor
 Attachments: HIVE-7898.1.patch


 Currently, Pig aliases must exactly match the names of HCat columns for 
 HCatStorer to be successful.  However, several Pig operations prepend a 
 namespace to the alias in order to differentiate fields (e.g. after a group 
 with field b, you might have A::b).  In this case, even if the fields are in 
 the right order and the alias without namespace matches, the store will fail 
 because it tries to match the long form of the alias, despite the namespace 
 being extraneous information in this case.   Note that multiple aliases can 
 be applied (e.g. A::B::C::d).
 A workaround is possible by doing a 
 FOREACH relation GENERATE field1 AS field1, field2 AS field2, etc.  
 This quickly becomes tedious and bloated for tables with many fields.
 Changing this would normally require care around columns named, for example, 
 `A::b` as has been introduced in Hive 13.  However, a different function call 
 only validates Pig aliases if they follow the old rules for Hive columns.  As 
 such, a direct change (rather than attempting to match either the 
 namespace::alias or just alias) maintains compatibility for now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 25140: HIVE-7898 HCatStorer should ignore namespaces generated by Pig

2014-08-27 Thread Justin Leet

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25140/
---

Review request for hive.


Bugs: HIVE-7898
https://issues.apache.org/jira/browse/HIVE-7898


Repository: hive-git


Description
---

Namespaces Pig aliases should be ignored, and the original alias should be used 
for matching Pig fields to HCat columns


Diffs
-

  
hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatBaseStorer.java
 ae60030 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorer.java
 fcfc642 

Diff: https://reviews.apache.org/r/25140/diff/


Testing
---

Added unit tests for storing without namespaces.


Thanks,

Justin Leet



Hive Contributor Request

2014-08-26 Thread Justin Leet
Hi all,

I'd like to be included as a Hive contributor.

My Jira username is: justinleet

Thanks,
Justin