[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-23 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24032:
---
Attachment: HIVE-24032.03.patch
Status: Patch Available  (was: In Progress)

> Remove hadoop shims dependency and use FileSystem Api directly from 
> standalone metastore
> 
>
> Key: HIVE-24032
> URL: https://issues.apache.org/jira/browse/HIVE-24032
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch, 
> HIVE-24032.03.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-23 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24032:
---
Status: In Progress  (was: Patch Available)

> Remove hadoop shims dependency and use FileSystem Api directly from 
> standalone metastore
> 
>
> Key: HIVE-24032
> URL: https://issues.apache.org/jira/browse/HIVE-24032
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch, 
> HIVE-24032.03.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=473732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473732
 ]

ASF GitHub Bot logged work on HIVE-23539:
-

Author: ASF GitHub Bot
Created on: 24/Aug/20 04:24
Start Date: 24/Aug/20 04:24
Worklog Time Spent: 10m 
  Work Description: pkumarsinha closed pull request #1084:
URL: https://github.com/apache/hive/pull/1084


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473732)
Time Spent: 2h 50m  (was: 2h 40m)

> Optimize data copy during repl load operation for HDFS based staging location
> -
>
> Key: HIVE-23539
> URL: https://issues.apache.org/jira/browse/HIVE-23539
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23539.01.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=473715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473715
 ]

ASF GitHub Bot logged work on HIVE-23539:
-

Author: ASF GitHub Bot
Created on: 24/Aug/20 00:41
Start Date: 24/Aug/20 00:41
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1084:
URL: https://github.com/apache/hive/pull/1084#issuecomment-678849350


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473715)
Time Spent: 2h 40m  (was: 2.5h)

> Optimize data copy during repl load operation for HDFS based staging location
> -
>
> Key: HIVE-23539
> URL: https://issues.apache.org/jira/browse/HIVE-23539
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23539.01.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion

2020-08-23 Thread Arko Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182872#comment-17182872
 ] 

Arko Sharma commented on HIVE-23926:


Running this test on the hive-flaky-check framework did not give any 
errors([link|http://ci.hive.apache.org/job/hive-flaky-check/88/]). Hence this 
test has been re-enabled.

 

> Flaky test 
> TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
> 
>
> Key: HIVE-23926
> URL: https://issues.apache.org/jira/browse/HIVE-23926
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23926.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion

2020-08-23 Thread Arko Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma updated HIVE-23926:
---
Attachment: HIVE-23926.01.patch
Status: Patch Available  (was: Open)

> Flaky test 
> TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
> 
>
> Key: HIVE-23926
> URL: https://issues.apache.org/jira/browse/HIVE-23926
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23926.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23926:
--
Labels: pull-request-available  (was: )

> Flaky test 
> TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
> 
>
> Key: HIVE-23926
> URL: https://issues.apache.org/jira/browse/HIVE-23926
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23926?focusedWorklogId=473709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473709
 ]

ASF GitHub Bot logged work on HIVE-23926:
-

Author: ASF GitHub Bot
Created on: 23/Aug/20 23:07
Start Date: 23/Aug/20 23:07
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1420:
URL: https://github.com/apache/hive/pull/1420


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473709)
Remaining Estimate: 0h
Time Spent: 10m

> Flaky test 
> TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
> 
>
> Key: HIVE-23926
> URL: https://issues.apache.org/jira/browse/HIVE-23926
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Arko Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-08-23 Thread Ashish Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182783#comment-17182783
 ] 

Ashish Sharma commented on HIVE-22782:
--

[~vgarg] I have raised a PR with changes and added some test cases. Please have 
a look on it. Mean while i am thinking what all more test case I can add in it.

> Consolidate metastore call to fetch constraints
> ---
>
> Key: HIVE-22782
> URL: https://issues.apache.org/jira/browse/HIVE-22782
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently separate calls are made to metastore to fetch constraints like Pk, 
> fk, not null etc. Since planner always retrieve these constraints we should 
> retrieve all of them in one call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24059) Llap external client - Initial changes for running in cloud environment

2020-08-23 Thread Shubham Chaurasia (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182746#comment-17182746
 ] 

Shubham Chaurasia commented on HIVE-24059:
--

[~prasanth_j] [~jdere] Can you please review ?

> Llap external client - Initial changes for running in cloud environment
> ---
>
> Key: HIVE-24059
> URL: https://issues.apache.org/jira/browse/HIVE-24059
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please see problem description in 
> https://issues.apache.org/jira/browse/HIVE-24058
> Initial changes include - 
> 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
> side.
> 2. Opening additional RPC port in LLAP Daemon.
> 3. JWT Based authentication on this port.
> cc [~prasanth_j] [~jdere] [~anishek] [~thejas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24059) Llap external client - Initial changes for running in cloud environment

2020-08-23 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-24059:
-
Description: 
Please see problem description in 
https://issues.apache.org/jira/browse/HIVE-24058

Initial changes include - 

1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
side.
2. Opening additional RPC port in LLAP Daemon.
3. JWT Based authentication on this port.


cc [~prasanth_j] [~jdere] [~anishek] [~thejas]

  was:
Please see problem description in 
https://issues.apache.org/jira/browse/HIVE-24058

Initial changes include - 

1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
side.
2. Opening additional RPC port in LLAP Daemon.
3. JWT Based authentication on this port.



> Llap external client - Initial changes for running in cloud environment
> ---
>
> Key: HIVE-24059
> URL: https://issues.apache.org/jira/browse/HIVE-24059
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please see problem description in 
> https://issues.apache.org/jira/browse/HIVE-24058
> Initial changes include - 
> 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
> side.
> 2. Opening additional RPC port in LLAP Daemon.
> 3. JWT Based authentication on this port.
> cc [~prasanth_j] [~jdere] [~anishek] [~thejas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24059) Llap external client - Initial changes for running in cloud environment

2020-08-23 Thread Shubham Chaurasia (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182744#comment-17182744
 ] 

Shubham Chaurasia commented on HIVE-24059:
--

This patch uses two env variables - 

{{IS_CLOUD_DEPLOYMENT}} - if we HS2 and LLAP are running in cloud env. 
{{PUBLIC_HOSTNAME}} - public hostname which can be reached from outside cloud.

Both these variables need to be set on HS2 and LLAP machines for this patch to 
work correctly.


> Llap external client - Initial changes for running in cloud environment
> ---
>
> Key: HIVE-24059
> URL: https://issues.apache.org/jira/browse/HIVE-24059
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please see problem description in 
> https://issues.apache.org/jira/browse/HIVE-24058
> Initial changes include - 
> 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
> side.
> 2. Opening additional RPC port in LLAP Daemon.
> 3. JWT Based authentication on this port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24020?focusedWorklogId=473644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473644
 ]

ASF GitHub Bot logged work on HIVE-24020:
-

Author: ASF GitHub Bot
Created on: 23/Aug/20 13:22
Start Date: 23/Aug/20 13:22
Worklog Time Spent: 10m 
  Work Description: vpnvishv commented on a change in pull request #1382:
URL: https://github.com/apache/hive/pull/1382#discussion_r475219031



##
File path: 
streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java
##
@@ -581,16 +582,9 @@ protected RecordUpdater getRecordUpdater(List 
partitionValues, int bucke
   destLocation = new Path(table.getSd().getLocation());
 } else {
   PartitionInfo partitionInfo = 
conn.createPartitionIfNotExists(partitionValues);
-  // collect the newly added partitions. 
connection.commitTransaction() will report the dynamically added
-  // partitions to TxnHandler
-  if (!partitionInfo.isExists()) {
-addedPartitions.add(partitionInfo.getName());
-  } else {
-if (LOG.isDebugEnabled()) {
-  LOG.debug("Partition {} already exists for table {}",
-  partitionInfo.getName(), fullyQualifiedTableName);
-}
-  }
+  // collect the newly added/updated partitions. 
connection.commitTransaction() will report the dynamically
+  // added partitions to TxnHandler
+  addedPartitions.add(partitionInfo.getName());

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473644)
Time Spent: 40m  (was: 0.5h)

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22782:
--
Labels: pull-request-available  (was: )

> Consolidate metastore call to fetch constraints
> ---
>
> Key: HIVE-22782
> URL: https://issues.apache.org/jira/browse/HIVE-22782
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently separate calls are made to metastore to fetch constraints like Pk, 
> fk, not null etc. Since planner always retrieve these constraints we should 
> retrieve all of them in one call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=473636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473636
 ]

ASF GitHub Bot logged work on HIVE-22782:
-

Author: ASF GitHub Bot
Created on: 23/Aug/20 11:23
Start Date: 23/Aug/20 11:23
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma opened a new pull request #1419:
URL: https://github.com/apache/hive/pull/1419


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473636)
Remaining Estimate: 0h
Time Spent: 10m

> Consolidate metastore call to fetch constraints
> ---
>
> Key: HIVE-22782
> URL: https://issues.apache.org/jira/browse/HIVE-22782
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Ashish Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently separate calls are made to metastore to fetch constraints like Pk, 
> fk, not null etc. Since planner always retrieve these constraints we should 
> retrieve all of them in one call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24059) Llap external client - Initial changes for running in cloud environment

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24059?focusedWorklogId=473627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473627
 ]

ASF GitHub Bot logged work on HIVE-24059:
-

Author: ASF GitHub Bot
Created on: 23/Aug/20 09:34
Start Date: 23/Aug/20 09:34
Worklog Time Spent: 10m 
  Work Description: ShubhamChaurasia opened a new pull request #1418:
URL: https://github.com/apache/hive/pull/1418


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473627)
Remaining Estimate: 0h
Time Spent: 10m

> Llap external client - Initial changes for running in cloud environment
> ---
>
> Key: HIVE-24059
> URL: https://issues.apache.org/jira/browse/HIVE-24059
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please see problem description in 
> https://issues.apache.org/jira/browse/HIVE-24058
> Initial changes include - 
> 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
> side.
> 2. Opening additional RPC port in LLAP Daemon.
> 3. JWT Based authentication on this port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24059) Llap external client - Initial changes for running in cloud environment

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24059:
--
Labels: pull-request-available  (was: )

> Llap external client - Initial changes for running in cloud environment
> ---
>
> Key: HIVE-24059
> URL: https://issues.apache.org/jira/browse/HIVE-24059
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Please see problem description in 
> https://issues.apache.org/jira/browse/HIVE-24058
> Initial changes include - 
> 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
> side.
> 2. Opening additional RPC port in LLAP Daemon.
> 3. JWT Based authentication on this port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24059) Llap external client - Initial changes for running in cloud environment

2020-08-23 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia reassigned HIVE-24059:



> Llap external client - Initial changes for running in cloud environment
> ---
>
> Key: HIVE-24059
> URL: https://issues.apache.org/jira/browse/HIVE-24059
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>
> Please see problem description in 
> https://issues.apache.org/jira/browse/HIVE-24058
> Initial changes include - 
> 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) 
> side.
> 2. Opening additional RPC port in LLAP Daemon.
> 3. JWT Based authentication on this port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24058) Llap external client - Enhancements for running in cloud environment

2020-08-23 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia reassigned HIVE-24058:



> Llap external client - Enhancements for running in cloud environment
> 
>
> Key: HIVE-24058
> URL: https://issues.apache.org/jira/browse/HIVE-24058
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>
> When we query using llap external client library, following happens currently 
> - 
> 1. We first need to get splits using 
> [LlapBaseInputFormat#getSplits()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L226],
>  this just needs Hive server JDBC url. 
> 2. We then submit those splits to llap and obtain record reader to read data 
> using 
> [LlapBaseInputFormat#getRecordReader()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L140].
>  In this step we need following at client side -
> - {{hive.zookeeper.quorum}}
> -{{hive.llap.daemon.service.hosts}}
> We need to connect to zk to discover llap daemons.
> 3. Record reader so obtained needs to [initiate a TCP connection from client 
> to LLAP Daemon to submit the 
> split|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L185].
> 4. It also needs to [initiate another TCP connection from client to output 
> format port in LLAP Daemon to read the 
> data|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L201].
> In cloud based deployments, we may not be able to make direct connections to 
> Zk registry and LLAP daemons from client as it might run outside vpc. 
> For 2, we can move daemon discovery logic to get_splits UDF itself which will 
> run in HS2.  
> For scenarios like 3 and 4, we can expose additional ports on LLAP with 
> additional auth mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24040) Slightly odd behaviour with CHAR comparisons and string literals

2020-08-23 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182633#comment-17182633
 ] 

Peter Vary commented on HIVE-24040:
---

[~kuczoram]: You might want to know about this, as I remember you were working 
to have a standard way to handle this through Parquet and ORC.

> Slightly odd behaviour with CHAR comparisons and string literals
> 
>
> Key: HIVE-24040
> URL: https://issues.apache.org/jira/browse/HIVE-24040
> Project: Hive
>  Issue Type: Bug
>Reporter: Tim Armstrong
>Priority: Major
>
> If t is a char column, this statement behaves a bit strangely - since the RHS 
> is a STRING, I would have expected it to behave consistently with other 
> CHAR/STRING comparisons, where the CHAR column has its trailing spaces 
> removed and the STRING does not have its trailing spaces removed.
> {noformat}
> select count(*) from ax where t = cast('a ' as string);
> {noformat}
> Instead it seems to be treated the same as if it was a plain literal, 
> interpreted as CHAR, i.e.
> {noformat}
> select count(*) from ax where t = 'a ';
> {noformat}
> Here are some more experiments I did based on 
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/in_typecheck_char.q
>  that seem to show some inconsistencies.
> {noformat}
> -- Hive version 3.1.3000.7.2.1.0-287 r4e72e59f1c2a51a64e0ff37b14bd396cd4e97b98
> create table ax(s char(1),t char(10));
> insert into ax values ('a','a'),('a','a '),('b','bb');
> -- varchar literal preserves trailing space
> select count(*) from ax where t = cast('a ' as varchar(50));
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> -- explicit cast of literal to string removes trailing space
> select count(*) from ax where t = cast('a ' as string);
> +--+
> | _c0  |
> +--+
> | 2|
> +--+
> -- other string expressions preserve trailing space
> select count(*) from ax where t = concat('a', ' ');
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> -- varchar col preserves trailing space
> create table stringv as select cast('a  ' as varchar(50));
> select count(*) from ax, stringv where t = `_c0`;
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> -- string col preserves trailing space
> create table stringa as select 'a  ';
> select count(*) from ax, stringa where t = `_c0`;
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> {noformat}
> [~jcamachorodriguez] [~kgyrtkirk]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24032?focusedWorklogId=473626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473626
 ]

ASF GitHub Bot logged work on HIVE-24032:
-

Author: ASF GitHub Bot
Created on: 23/Aug/20 08:34
Start Date: 23/Aug/20 08:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1396:
URL: https://github.com/apache/hive/pull/1396#issuecomment-678746263


   Why is this a good thing?
   AFAIK we introduced the Shims to be able to abstract out Hadoop dependencies 
and being able to work with different versions of Hadoop altogether. Removing 
shims will again fix as to a specific Hadoop version.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473626)
Time Spent: 1h 20m  (was: 1h 10m)

> Remove hadoop shims dependency and use FileSystem Api directly from 
> standalone metastore
> 
>
> Key: HIVE-24032
> URL: https://issues.apache.org/jira/browse/HIVE-24032
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24020?focusedWorklogId=473617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473617
 ]

ASF GitHub Bot logged work on HIVE-24020:
-

Author: ASF GitHub Bot
Created on: 23/Aug/20 08:19
Start Date: 23/Aug/20 08:19
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1382:
URL: https://github.com/apache/hive/pull/1382#discussion_r475189153



##
File path: 
streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java
##
@@ -581,16 +582,9 @@ protected RecordUpdater getRecordUpdater(List 
partitionValues, int bucke
   destLocation = new Path(table.getSd().getLocation());
 } else {
   PartitionInfo partitionInfo = 
conn.createPartitionIfNotExists(partitionValues);
-  // collect the newly added partitions. 
connection.commitTransaction() will report the dynamically added
-  // partitions to TxnHandler
-  if (!partitionInfo.isExists()) {
-addedPartitions.add(partitionInfo.getName());
-  } else {
-if (LOG.isDebugEnabled()) {
-  LOG.debug("Partition {} already exists for table {}",
-  partitionInfo.getName(), fullyQualifiedTableName);
-}
-  }
+  // collect the newly added/updated partitions. 
connection.commitTransaction() will report the dynamically
+  // added partitions to TxnHandler
+  addedPartitions.add(partitionInfo.getName());

Review comment:
   Can we please rename the 'addedPartitions' to something like more 
resembling the actual usage, like 'updatedPartitions', or 'changedPartitions', 
or whatever? Maybe a comment on the attribute/and getter should be good as well.
   Otherwise LGTM.
   Thanks, Peter





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473617)
Time Spent: 0.5h  (was: 20m)

> Automatic Compaction not working in existing partitions for Streaming Ingest 
> with Dynamic Partition
> ---
>
> Key: HIVE-24020
> URL: https://issues.apache.org/jira/browse/HIVE-24020
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming, Transactions
>Affects Versions: 4.0.0, 3.1.2
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This issue happens when we try to do streaming ingest with dynamic partition 
> on already existing partitions. I checked in the code, we have following 
> check in the AbstractRecordWriter.
>  
> {code:java}
> PartitionInfo partitionInfo = 
> conn.createPartitionIfNotExists(partitionValues);
> // collect the newly added partitions. connection.commitTransaction() will 
> report the dynamically added
> // partitions to TxnHandler
> if (!partitionInfo.isExists()) {
>   addedPartitions.add(partitionInfo.getName());
> } else {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Partition {} already exists for table {}",
> partitionInfo.getName(), fullyQualifiedTableName);
>   }
> }
> {code}
> Above *addedPartitions* is passed to *addDynamicPartitions* during 
> TransactionBatch commit. So in case of already existing partitions, 
> *addedPartitions* will be empty and *addDynamicPartitions* **will not move 
> entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in 
> Initiator not able to trigger auto compaction.
> Another issue which has been observed is, we are not clearing 
> *addedPartitions* on writer close, which results in information flowing 
> across transactions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag

2020-08-23 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182622#comment-17182622
 ] 

Peter Vary commented on HIVE-24000:
---

[~dkuzmenko]: I am not against the change, but next time please wait for a +1 
from a committer as per 
[https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess]
{quote}
Committers: for non-trivial changes, it is best to get another committer to 
review your patches before commit. Use the Submit Patch link like other 
contributors, and then wait for a "+1" from another committer before 
committing. Please also try to frequently review things in the patch queue.
{quote}

> Put exclusive MERGE INSERT under the feature flag
> -
>
> Key: HIVE-24000
> URL: https://issues.apache.org/jira/browse/HIVE-24000
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=473610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473610
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 23/Aug/20 06:27
Start Date: 23/Aug/20 06:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r475177713



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String 
dbname, String name, Table
   }
 }
 
+@Override
+public GetFileListResponse get_file_list(GetFileListRequest req) throws 
MetaException {
+  String catName = req.isSetCatName() ? req.getCatName() : 
getDefaultCatalog(conf);
+  String dbName = req.getDbName();
+  String tblName = req.getTableName();
+  List partitions = req.getPartVals();
+  // Will be used later, when cache is introduced
+  String validWriteIdList = req.getValidWriteIdList();
+
+  startFunction("get_file_list", ": " + TableName.getQualified(catName, 
dbName, tblName)
+  + ", partitions: " + partitions.toString());
+
+
+  GetFileListResponse response = new GetFileListResponse();
+
+  boolean success = false;
+  Exception ex = null;
+  try {
+Partition p =  getMS().getPartition(catName, dbName, tblName, 
partitions);
+Path path = new Path(p.getSd().getLocation());
+
+FileSystem fs = path.getFileSystem(conf);
+RemoteIterator itr = fs.listFiles(path, true);
+while (itr.hasNext()) {
+  FileStatus fStatus = itr.next();
+  Reader reader = OrcFile.createReader(fStatus.getPath(), 
OrcFile.readerOptions(fs.getConf()));
+  boolean isRawFormat  = 
!CollectionUtils.isEqualCollection(reader.getSchema().getFieldNames(), 
ALL_ACID_ROW_NAMES);
+  int fileFormat = isRawFormat ? 0 : 2;

Review comment:
   Please use enum on the Java side, but on thrift side stick to 
strings/ints whatever. Again, when upgrading HMS API enum in the thrift can 
cause issues





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473610)
Time Spent: 2h 40m  (was: 2.5h)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=473609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473609
 ]

ASF GitHub Bot logged work on HIVE-23890:
-

Author: ASF GitHub Bot
Created on: 23/Aug/20 06:25
Start Date: 23/Aug/20 06:25
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r475177465



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list partVals,
+  6: optional string validWriteIdList
+}
+
+struct GetFileListResponse {
+  1: optional list fileListData,
+  2: optional i32 fbVersionNumber

Review comment:
   AFAIK we try to keep every field optional in the HMS APIs so we will be 
able to change them if needed. I specifically asked Barna to do this. Please 
Vihang correct me if I was mistaken. Thanks, Peter 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 473609)
Time Spent: 2.5h  (was: 2h 20m)

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)