[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=479541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479541
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 07/Sep/20 08:39
Start Date: 07/Sep/20 08:39
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #1379:
URL: https://github.com/apache/hive/pull/1379


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479541)
Time Spent: 1h 20m  (was: 1h 10m)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=479100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479100
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 13:13
Start Date: 04/Sep/20 13:13
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1379:
URL: https://github.com/apache/hive/pull/1379#discussion_r483607071



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
##
@@ -265,11 +279,70 @@ public URI apply(Path path) {
 }
 dag.addURIsForCredentials(uris);
   }
+  getKafkaCredentials((MapWork)work, dag, conf);
 }
-
 getCredentialsForFileSinks(work, dag);
   }
 
+  private void getKafkaCredentials(MapWork work, DAG dag, JobConf conf) {
+Token tokenCheck = 
dag.getCredentials().getToken(KAFKA_DELEGATION_TOKEN_KEY);
+if (tokenCheck != null) {
+  LOG.debug("Kafka credentials already added, skipping...");
+  return;
+}
+LOG.info("Getting kafka credentials for mapwork: " + work.getName());
+
+String kafkaBrokers = null;
+Map partitions = work.getAliasToPartnInfo();

Review comment:
   @ashutoshc : what do you think about this? 
https://github.com/apache/hive/pull/1379/commits/edc4ad440af4e234b731104bbcb9837cbdf43a19#diff-d7b5b051769b68ed5dd602cc30744439R303-R306
   (tested on cluster)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479100)
Time Spent: 1h 10m  (was: 1h)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=479034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479034
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 04/Sep/20 09:28
Start Date: 04/Sep/20 09:28
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1379:
URL: https://github.com/apache/hive/pull/1379#discussion_r483501753



##
File path: pom.xml
##
@@ -169,6 +169,7 @@
 4.13
 5.6.2
 5.6.2
+2.5.0

Review comment:
   sure!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 479034)
Time Spent: 1h  (was: 50m)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=478705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478705
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 03/Sep/20 17:24
Start Date: 03/Sep/20 17:24
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1379:
URL: https://github.com/apache/hive/pull/1379#discussion_r483139073



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
##
@@ -265,11 +279,70 @@ public URI apply(Path path) {
 }
 dag.addURIsForCredentials(uris);
   }
+  getKafkaCredentials((MapWork)work, dag, conf);
 }
-
 getCredentialsForFileSinks(work, dag);
   }
 
+  private void getKafkaCredentials(MapWork work, DAG dag, JobConf conf) {
+Token tokenCheck = 
dag.getCredentials().getToken(KAFKA_DELEGATION_TOKEN_KEY);
+if (tokenCheck != null) {
+  LOG.debug("Kafka credentials already added, skipping...");
+  return;
+}
+LOG.info("Getting kafka credentials for mapwork: " + work.getName());
+
+String kafkaBrokers = null;
+Map partitions = work.getAliasToPartnInfo();

Review comment:
   good point, I was thinking about the same, this can become an expensive 
loop and 100% useless for users not using kafka, let me look for a 
short-circuit way





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 478705)
Time Spent: 50m  (was: 40m)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=478704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478704
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 03/Sep/20 17:23
Start Date: 03/Sep/20 17:23
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1379:
URL: https://github.com/apache/hive/pull/1379#discussion_r483139073



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
##
@@ -265,11 +279,70 @@ public URI apply(Path path) {
 }
 dag.addURIsForCredentials(uris);
   }
+  getKafkaCredentials((MapWork)work, dag, conf);
 }
-
 getCredentialsForFileSinks(work, dag);
   }
 
+  private void getKafkaCredentials(MapWork work, DAG dag, JobConf conf) {
+Token tokenCheck = 
dag.getCredentials().getToken(KAFKA_DELEGATION_TOKEN_KEY);
+if (tokenCheck != null) {
+  LOG.debug("Kafka credentials already added, skipping...");
+  return;
+}
+LOG.info("Getting kafka credentials for mapwork: " + work.getName());
+
+String kafkaBrokers = null;
+Map partitions = work.getAliasToPartnInfo();

Review comment:
   good point, I was thinking about the same, this can become an expensive 
loop and 100% useless for users not using kafka, let me look a short-circuit way





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 478704)
Time Spent: 40m  (was: 0.5h)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=478631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-478631
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 03/Sep/20 15:00
Start Date: 03/Sep/20 15:00
Worklog Time Spent: 10m 
  Work Description: ashutoshc commented on a change in pull request #1379:
URL: https://github.com/apache/hive/pull/1379#discussion_r483044949



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
##
@@ -265,11 +279,70 @@ public URI apply(Path path) {
 }
 dag.addURIsForCredentials(uris);
   }
+  getKafkaCredentials((MapWork)work, dag, conf);
 }
-
 getCredentialsForFileSinks(work, dag);
   }
 
+  private void getKafkaCredentials(MapWork work, DAG dag, JobConf conf) {
+Token tokenCheck = 
dag.getCredentials().getToken(KAFKA_DELEGATION_TOKEN_KEY);
+if (tokenCheck != null) {
+  LOG.debug("Kafka credentials already added, skipping...");
+  return;
+}
+LOG.info("Getting kafka credentials for mapwork: " + work.getName());
+
+String kafkaBrokers = null;
+Map partitions = work.getAliasToPartnInfo();

Review comment:
   This is iterating over all partition objects in plan even when kafka is 
not used. This gets expensive when there are large number of partition objects. 
Is it possible to do a quick check to see if kafka is used before iterating 
over full list of parttions?

##
File path: pom.xml
##
@@ -169,6 +169,7 @@
 4.13
 5.6.2
 5.6.2
+2.5.0

Review comment:
   Kafka version is also declared specifically in kafka-handler/pom.xml Can 
we parmeterize there, so that all of Hive is referencing just one kafka version?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 478631)
Time Spent: 0.5h  (was: 20m)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-08-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=468499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468499
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 10/Aug/20 10:42
Start Date: 10/Aug/20 10:42
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1379:
URL: https://github.com/apache/hive/pull/1379#issuecomment-671283300


   @rajkrrsingh , @ashutoshc : could you please take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468499)
Time Spent: 20m  (was: 10m)

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23408) Hive on Tez : Kafka storage handler broken in secure environment

2020-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23408?focusedWorklogId=467874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467874
 ]

ASF GitHub Bot logged work on HIVE-23408:
-

Author: ASF GitHub Bot
Created on: 07/Aug/20 14:54
Start Date: 07/Aug/20 14:54
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1379:
URL: https://github.com/apache/hive/pull/1379


   …ronment
   
   Change-Id: I486e9169279765b8a695d27264b770c8db92128f
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467874)
Remaining Estimate: 0h
Time Spent: 10m

> Hive on Tez :  Kafka storage handler broken in secure environment
> -
>
> Key: HIVE-23408
> URL: https://issues.apache.org/jira/browse/HIVE-23408
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.server2.authentication.kerberos.principal set in the form of 
> hive/_HOST@REALM,
> Tez task can start at the random NM host and unfold the value of _HOST with 
> the value of fqdn where it is running. this leads to an authentication issue.
> for LLAP there is fallback for LLAP daemon keytab/principal, Kafka 1.1 
> onwards support delegation token and we should take advantage of it for hive 
> on tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)