[jira] [Updated] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-08 Thread Ashutosh Chauhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-23363:

Fix Version/s: 4.0.0
 Assignee: David Mollitor  (was: Zoltan Chovan)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, [~belugabehr]

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23770) Druid filter translation unable to handle inverted between

2020-07-08 Thread Ashutosh Chauhan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154169#comment-17154169
 ] 

Ashutosh Chauhan commented on HIVE-23770:
-

[~nishantbangarwa] is this patch ready for commit ?

> Druid filter translation unable to handle inverted between
> --
>
> Key: HIVE-23770
> URL: https://issues.apache.org/jira/browse/HIVE-23770
> Project: Hive
>  Issue Type: Bug
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23770.1.patch, HIVE-23770.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Druid filter translation happens in Calcite and does not uses HiveBetween 
> inverted flag for translation this misses a negation in the planned query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23811) deleteReader SARG rowId/bucketId are not getting validated properly

2020-07-08 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-23811:
--
Status: Patch Available  (was: Open)

> deleteReader SARG rowId/bucketId are not getting validated properly
> ---
>
> Key: HIVE-23811
> URL: https://issues.apache.org/jira/browse/HIVE-23811
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Though we are iterating over min/max stripeIndex, we always seem to pick 
> ColumnStats from first stripe
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L596]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23800) Make HiveServer2 oom hook interface

2020-07-08 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154134#comment-17154134
 ] 

Zhihua Deng commented on HIVE-23800:


The oom hook holds a hiveserver2 instance, which calls hiveserver2::stop() to 
end hiveserver2 gracefully, which would cleanup the scratch(staging) 
directory/operation log and so on . Although the hooks in the driver can handle 
oom, he may not be able to stop the hiveserver2 gracefully as the oom hook 
does. Sometimes we may want to dump the heap for futher analysis when oom 
happens or alter the devops, so it may be better to make the oom hook here an 
interface.

> Make HiveServer2 oom hook interface
> ---
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23800) Make HiveServer2 oom hook interface

2020-07-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23800:
---
Description: Make oom hook an interface of HiveServer2,  so user can 
implement the hook to do something before HS2 stops, such as dumping the heap 
or altering the devops.

> Make HiveServer2 oom hook interface
> ---
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23727:
---
Summary: Improve SQLOperation log handling when cancel background  (was: 
Improve SQLOperation log handling when cleanup)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-23727:
--

Assignee: Zhihua Deng

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23727:
---
Comment: was deleted

(was: Fix the log output only, refine the condition in the future if needed.)

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23727:
---
Comment: was deleted

(was: In a busy env, the operation may be pended(asyncPrepare is enabled),  so 
it's better to change the condition from if (shouldRunAsync() && state != 
OperationState.CANCELED && state != OperationState.TIMEDOUT) to if 
(shouldRunAsync() &&  oldState == OperationState.PENDING). )

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23727:
---
Comment: was deleted

(was: I'm wondering if we can improve the whole branch if (shouldRunAsync() && 
state != OperationState.CANCELED && state != OperationState.TIMEDOUT)  here.  
The codes here make some confusing to me, as state = OperationState.CLOSED will 
be the only case that the canceling background will take effect, in this case 
the operation may be finished, closed, failed, running(ctrl+c or session 
timeout) or pended. There is no need to cancel the finished, closed, failed 
operations, the running operations can be treated as the timeout operations, 
which are cleaned up by driver::close.)

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-08 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23727:
---
Comment: was deleted

(was: [~ychena] [~ctang] Could you take some time to look at this?)

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23797) Throw exception when no metastore found in zookeeper

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23797?focusedWorklogId=456397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456397
 ]

ASF GitHub Bot logged work on HIVE-23797:
-

Author: ASF GitHub Bot
Created on: 09/Jul/20 01:54
Start Date: 09/Jul/20 01:54
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1201:
URL: https://github.com/apache/hive/pull/1201#issuecomment-655848906


   @belugabehr can you take another look at the changes? thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456397)
Time Spent: 50m  (was: 40m)

> Throw exception when no metastore  found in zookeeper
> -
>
> Key: HIVE-23797
> URL: https://issues.apache.org/jira/browse/HIVE-23797
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When enable service discovery for metastore, there is a chance that the 
> client may find no metastore uris available in zookeeper, such as during 
> metastores startup or the client wrongly configured the path. This results to 
> redundant retries and finally MetaException with "Unknown exception" message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23347) MSCK REPAIR cannot discover partitions with upper case directory names.

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23347?focusedWorklogId=456371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456371
 ]

ASF GitHub Bot logged work on HIVE-23347:
-

Author: ASF GitHub Bot
Created on: 09/Jul/20 00:32
Start Date: 09/Jul/20 00:32
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1003:
URL: https://github.com/apache/hive/pull/1003


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456371)
Time Spent: 0.5h  (was: 20m)

> MSCK REPAIR cannot discover partitions with upper case directory names.
> ---
>
> Key: HIVE-23347
> URL: https://issues.apache.org/jira/browse/HIVE-23347
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.0
>Reporter: Sankar Hariappan
>Assignee: Adesh Kumar Rao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23347.01.patch, HIVE-23347.10.patch, 
> HIVE-23347.2.patch, HIVE-23347.3.patch, HIVE-23347.4.patch, 
> HIVE-23347.5.patch, HIVE-23347.6.patch, HIVE-23347.7.patch, 
> HIVE-23347.8.patch, HIVE-23347.9.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For the following scenario, we expect MSCK REPAIR to discover partitions but 
> it couldn't.
> 1. Have partitioned data path as follows.
> hdfs://mycluster/datapath/t1/Year=2020/Month=03/Day=10
> hdfs://mycluster/datapath/t1/Year=2020/Month=03/Day=11
> 2. create external table t1 (key int, value string) partitioned by (Year int, 
> Month int, Day int) stored as orc location hdfs://mycluster/datapath/t1'';
> 3. msck repair table t1;
> 4. show partitions t1; --> Returns zero partitions
> 5. select * from t1; --> Returns empty data.
> When the partition directory names are changed to lower case, this works fine.
> hdfs://mycluster/datapath/t1/year=2020/month=03/day=10
> hdfs://mycluster/datapath/t1/year=2020/month=03/day=11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23822:
--
Labels: pull-request-available  (was: )

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?focusedWorklogId=456363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456363
 ]

ASF GitHub Bot logged work on HIVE-23822:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 23:00
Start Date: 08/Jul/20 23:00
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 opened a new pull request #1231:
URL: https://github.com/apache/hive/pull/1231


   …at task
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456363)
Remaining Estimate: 0h
Time Spent: 10m

> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23822) Sorted dynamic partition optimization could remove auto stat task

2020-07-08 Thread Vineet Garg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-23822:
--


> Sorted dynamic partition optimization could remove auto stat task
> -
>
> Key: HIVE-23822
> URL: https://issues.apache.org/jira/browse/HIVE-23822
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> {{mm_dp}} has reproducer where INSERT query is missing auto stats task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-07-08 Thread Ashutosh Chauhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-23277.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks, Attila!

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 
> AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-07-08 Thread Ashutosh Chauhan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154024#comment-17154024
 ] 

Ashutosh Chauhan commented on HIVE-23277:
-

+1

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 
> AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-07-08 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154002#comment-17154002
 ] 

Naveen Gangam commented on HIVE-23780:
--

That makes sense. I forgot that there are no JDO mappings for these tables, 
probably why we have this AcidListener in the first place. I havent reviewed 
the test changes but the code changes look good to me. +1 for me.

> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23069:

Attachment: HIVE-23069.01.patch

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23069:

Attachment: (was: HIVE-23069.01.patch)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-07-08 Thread Mustafa Iman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153829#comment-17153829
 ] 

Mustafa Iman commented on HIVE-23780:
-

[~ngangam] I had initially considered doing what you said. It was harder to do 
it that way because ObjectStore used JDO and this cleanup happens using raw 
sql. I could not find a way for them to share the same transaction. Then, Peter 
said we should do this using transactionalListener anyway for the reasons he 
explained above.

> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23819) Use ranges in ValidReadTxnList serialization

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23819?focusedWorklogId=456269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456269
 ]

ASF GitHub Bot logged work on HIVE-23819:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 17:53
Start Date: 08/Jul/20 17:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1230:
URL: https://github.com/apache/hive/pull/1230#issuecomment-655666860


   Can we have microbenchmarks for serializing/deserializing for edge cases at 
least?
   * Everything is one big range
   * Everything is single event
   * Everything is 2 long range
   
   Thanks,
   Peter



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456269)
Time Spent: 20m  (was: 10m)

> Use ranges in ValidReadTxnList serialization
> 
>
> Key: HIVE-23819
> URL: https://issues.apache.org/jira/browse/HIVE-23819
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Time to time we see a case, when the open / aborted transaction count is high 
> and often the aborted transactions come in continues ranges.
> When the transaction count goes high the serialization / deserialization to 
> hive.txn.valid.txns conf gets slower and produces a large config value.
> Using ranges in the string representation can mitigate the issue somewhat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-07-08 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153797#comment-17153797
 ] 

Peter Vary commented on HIVE-23780:
---

Sorry [~ngangam], just realized that you commented on the jira (after pushing 
the change :( ).

To answer your question, we very strictly try to separate TxnHandler stuff from 
ObjectStore stuff - If we want to split out transaction related classes later 
this way we can do it without too much effort. Also, the transactionalListener 
was created for just the same purpose. We use it for the notifications as well.
So all-in-all the answer is that we want to keep TxnHandler as separated from 
ObjectStore as possible.

Thanks,
Peter

> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23780?focusedWorklogId=456258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456258
 ]

ASF GitHub Bot logged work on HIVE-23780:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 17:38
Start Date: 08/Jul/20 17:38
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1192:
URL: https://github.com/apache/hive/pull/1192


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456258)
Time Spent: 0.5h  (was: 20m)

> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-07-08 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23780.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the patch [~mustafaiman]!

> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-07-08 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153742#comment-17153742
 ] 

Naveen Gangam commented on HIVE-23780:
--

[~mustafaiman] [~pvary] I dont have enough context with this AcidListener and 
why its being done here but would it make sense to do this logic as part of the 
dropTable prior to committing the table drop, in the ObjectStore itself? The 
net effect with the proposed fix is about the same (other than order of 
dropping rows) but this appears cleaner and more predictable.
Just wanted to know your thoughts on it. Thanks


> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23819) Use ranges in ValidReadTxnList serialization

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23819?focusedWorklogId=456219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456219
 ]

ASF GitHub Bot logged work on HIVE-23819:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 16:00
Start Date: 08/Jul/20 16:00
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1230:
URL: https://github.com/apache/hive/pull/1230


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456219)
Remaining Estimate: 0h
Time Spent: 10m

> Use ranges in ValidReadTxnList serialization
> 
>
> Key: HIVE-23819
> URL: https://issues.apache.org/jira/browse/HIVE-23819
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Time to time we see a case, when the open / aborted transaction count is high 
> and often the aborted transactions come in continues ranges.
> When the transaction count goes high the serialization / deserialization to 
> hive.txn.valid.txns conf gets slower and produces a large config value.
> Using ranges in the string representation can mitigate the issue somewhat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23819) Use ranges in ValidReadTxnList serialization

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23819:
--
Labels: pull-request-available  (was: )

> Use ranges in ValidReadTxnList serialization
> 
>
> Key: HIVE-23819
> URL: https://issues.apache.org/jira/browse/HIVE-23819
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Time to time we see a case, when the open / aborted transaction count is high 
> and often the aborted transactions come in continues ranges.
> When the transaction count goes high the serialization / deserialization to 
> hive.txn.valid.txns conf gets slower and produces a large config value.
> Using ranges in the string representation can mitigate the issue somewhat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23819) Use ranges in ValidReadTxnList serialization

2020-07-08 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-23819:
--


> Use ranges in ValidReadTxnList serialization
> 
>
> Key: HIVE-23819
> URL: https://issues.apache.org/jira/browse/HIVE-23819
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> Time to time we see a case, when the open / aborted transaction count is high 
> and often the aborted transactions come in continues ranges.
> When the transaction count goes high the serialization / deserialization to 
> hive.txn.valid.txns conf gets slower and produces a large config value.
> Using ranges in the string representation can mitigate the issue somewhat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=456189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456189
 ]

ASF GitHub Bot logged work on HIVE-23760:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 15:08
Start Date: 08/Jul/20 15:08
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #1216:
URL: https://github.com/apache/hive/pull/1216#issuecomment-655578545


   @pvary, would you mind taking a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456189)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrading to Kafka 2.5 Clients
> --
>
> Key: HIVE-23760
> URL: https://issues.apache.org/jira/browse/HIVE-23760
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Andras Katona
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=456188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456188
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 15:06
Start Date: 08/Jul/20 15:06
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r451617508



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ *
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try{
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+}catch(IOException e){
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch (IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  beeLine.output(generator.getOutputTarget().toString());
+  generator.flush();
+} catch (IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+try{
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(vals[i].equalsIgnoreCase("true"));

Review comment:
   Take a look at using `Boolean#parse` instead:
   
   
https://docs.oracle.com/javase/8/docs/api/java/lang/Boolean.html#parseBoolean-java.lang.String-





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456188)
Time Spent: 2h 10m  (was: 2h)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline 

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=456187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456187
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 15:05
Start Date: 08/Jul/20 15:05
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #421:
URL: https://github.com/apache/hive/pull/421


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456187)
Time Spent: 2h  (was: 1h 50m)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23818) Use String Switch-Case Statement in StatUtils

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23818?focusedWorklogId=456179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456179
 ]

ASF GitHub Bot logged work on HIVE-23818:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 14:53
Start Date: 08/Jul/20 14:53
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1229:
URL: https://github.com/apache/hive/pull/1229


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456179)
Remaining Estimate: 0h
Time Spent: 10m

> Use String Switch-Case Statement in StatUtils
> -
>
> Key: HIVE-23818
> URL: https://issues.apache.org/jira/browse/HIVE-23818
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> switch-case statements with Java is now available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23818) Use String Switch-Case Statement in StatUtils

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23818:
--
Labels: pull-request-available  (was: )

> Use String Switch-Case Statement in StatUtils
> -
>
> Key: HIVE-23818
> URL: https://issues.apache.org/jira/browse/HIVE-23818
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> switch-case statements with Java is now available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23818) Use String Switch-Case Statement in StatUtils

2020-07-08 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23818:
--
Description: switch-case statements with Java is now available.

> Use String Switch-Case Statement in StatUtils
> -
>
> Key: HIVE-23818
> URL: https://issues.apache.org/jira/browse/HIVE-23818
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> switch-case statements with Java is now available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-22301) Hive lineage is not generated for insert overwrite queries on partitioned tables

2020-07-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-22301.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

pushed to master. Thank you Jesus and Denys for reviewing the changes!

> Hive lineage is not generated for insert overwrite queries on partitioned 
> tables
> 
>
> Key: HIVE-22301
> URL: https://issues.apache.org/jira/browse/HIVE-22301
> Project: Hive
>  Issue Type: Bug
>  Components: lineage
>Affects Versions: 3.1.2
>Reporter: Sidharth Kumar Mishra
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: ScreenShot HookContext.png, ScreenShot 
> RunPostExecHook.png, ScreenShot runBeforeExecution.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Problem: When I run the below mentioned queries, the last query should have 
> given the proper hive lineage info (through HookContext) from table_b to 
> table_t.
>  * Create table table_t (id int) partitioned by (dob date);
>  * Create table table_b (id int) partitioned by (dob date);
>  * from table_b a insert overwrite table table_t select a.id,a.dob;
> Note : for CTAS query from a partitioned table , this issue is not seen. Only 
> for insert queries like insert into  select * from  and query 
> like above, issue is seen.
>  
> Technical Observations:
> At HookContext (passed from hive.ql.Driver to Hive Hook of Atlas through 
> hookRunner.runPostExecHooks call) contains no outputs. Check below screenshot 
> from IntelliJ.
> !ScreenShot RunPostExecHook.png|width=728,height=427!
>  
> I found that the PrivateHookContext is getting created with proper outputs 
> value as shown below initially:
>   !ScreenShot HookContext.png|width=714,height=541!
> The same is passed properly to runBeforeExecutionHook as shown below:
> !ScreenShot runBeforeExecution.png|width=719,height=620!
>  
> Later when we pass HookContext to runPostExecHooks, there is no output 
> populated. Kindly check the reason and let me know if you need any further 
> information from my end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23818) Use String Switch-Case Statement in StatUtils

2020-07-08 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23818:
-


> Use String Switch-Case Statement in StatUtils
> -
>
> Key: HIVE-23818
> URL: https://issues.apache.org/jira/browse/HIVE-23818
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22301) Hive lineage is not generated for insert overwrite queries on partitioned tables

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22301?focusedWorklogId=456178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456178
 ]

ASF GitHub Bot logged work on HIVE-22301:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 14:51
Start Date: 08/Jul/20 14:51
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1210:
URL: https://github.com/apache/hive/pull/1210


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456178)
Time Spent: 0.5h  (was: 20m)

> Hive lineage is not generated for insert overwrite queries on partitioned 
> tables
> 
>
> Key: HIVE-22301
> URL: https://issues.apache.org/jira/browse/HIVE-22301
> Project: Hive
>  Issue Type: Bug
>  Components: lineage
>Affects Versions: 3.1.2
>Reporter: Sidharth Kumar Mishra
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: ScreenShot HookContext.png, ScreenShot 
> RunPostExecHook.png, ScreenShot runBeforeExecution.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Problem: When I run the below mentioned queries, the last query should have 
> given the proper hive lineage info (through HookContext) from table_b to 
> table_t.
>  * Create table table_t (id int) partitioned by (dob date);
>  * Create table table_b (id int) partitioned by (dob date);
>  * from table_b a insert overwrite table table_t select a.id,a.dob;
> Note : for CTAS query from a partitioned table , this issue is not seen. Only 
> for insert queries like insert into  select * from  and query 
> like above, issue is seen.
>  
> Technical Observations:
> At HookContext (passed from hive.ql.Driver to Hive Hook of Atlas through 
> hookRunner.runPostExecHooks call) contains no outputs. Check below screenshot 
> from IntelliJ.
> !ScreenShot RunPostExecHook.png|width=728,height=427!
>  
> I found that the PrivateHookContext is getting created with proper outputs 
> value as shown below initially:
>   !ScreenShot HookContext.png|width=714,height=541!
> The same is passed properly to runBeforeExecutionHook as shown below:
> !ScreenShot runBeforeExecution.png|width=719,height=620!
>  
> Later when we pass HookContext to runPostExecHooks, there is no output 
> populated. Kindly check the reason and let me know if you need any further 
> information from my end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=456176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456176
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 14:49
Start Date: 08/Jul/20 14:49
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r451603702



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ *
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try{
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+}catch(IOException e){
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch (IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  beeLine.output(generator.getOutputTarget().toString());
+  generator.flush();

Review comment:
   Typically you want to `flush` the object to the underlying stream 
(`ByteArrayOutputStream` in this case) in case the object is doing any kind of 
internal buffering in order to "flush" its content out.
   
   While your `beeLine.output` is technically correct.  It's a bit confusing to 
other coders.
   
   Order of operations should be:
   
   1. Flush the generator to clear any buffering into the target `OutputStream`
   2. Convert the `OutputStream` into the target output format (String in this 
situation)
   3. Please use a `new String()` with UTF-8 encoding explicitly specified here
   
   
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#String-byte:A-java.nio.charset.Charset-

##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * 

[jira] [Updated] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23817:
--
Labels: pull-request-available  (was: )

> Pushing TopN Key operator PKFK inner joins
> --
>
> Key: HIVE-23817
> URL: https://issues.apache.org/jira/browse/HIVE-23817
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If there is primary key foreign key relationship between the tables we can 
> push the topnkey operator through the join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23817?focusedWorklogId=456166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456166
 ]

ASF GitHub Bot logged work on HIVE-23817:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 14:23
Start Date: 08/Jul/20 14:23
Worklog Time Spent: 10m 
  Work Description: zeroflag opened a new pull request #1228:
URL: https://github.com/apache/hive/pull/1228


   ## NOTICE (work in progress)
   
   ### Pushing the TopNKey operator through PK-FK inner joins.
   
   Example: 
   
   Customer table:
   
   ID (PK) | LAST_NAME
   -- | --
   1 | Robinson
   2 | Jones
   3 | Smith
   4 | Heisenberg
   
   Order table:
   
   CUSTOMER_ID (FK) | AMOUNT
   -- | --
   1 | 100
   1 | 50
   2 | 200
   3 | 30
   3 | 40
   
    Requirements for doing TopN Key pushdown.
   
   * The PRIMARY KEY constraint on Customer.ID that forbids NULL and duplicate 
values.
   * The NOT_NULL constraint on Order.CUSTOMER_ID that forbids NULL values.
   * Plus the FOREIGN KEY constraint between Customer.ID and Order.CUSTOMER_ID 
ensures that exactly one row exists in the Customer table for any given row in 
the Order table.
   
   In general if the first n of the order by columns are coming from the child 
table (FK) then we can copy the TopNKey operator with the first n columns and 
put it before the join. If all columns are coming from the child table we can 
move the TopNKey operator without keeping the original.
   
   ```
   SELECT * FROM Customer, Order 
   WHERE Customer.ID = Order.CUSTOMER_ID 
   ORDER BY Order.AMOUNT, [Order.*], [Customer.*] LIMIT 3;
   ```
   
   Result:
   
   CUSTOMER.ID (PK) | CUSTOMER.LAST_NAME | ORDER.AMOUNT
   -- | -- | --
   3 | Smith | 30
   3 | Smith | 40
   1 | Robinson | 50
   1 | Robinson | 100
   2 | Jones | 200
   
   Plan
   
   ```
   Top N Key Operator
 sort order: +
 keys: ORDER.AMOUNT, [ORDER.*]
 top n: 3
 Select Operator (Order)
 [...]
 Join
   [...]
Top N Key Operator
 sort order: +
 keys: ORDER.AMOUNT, [ORDER.*], [Customer.*]
 top n: 3

   ```
   
    Implementation notes
   
   PkFk join information is extracted on the calcite side and it is attached 
(child table index & name) to the AST as a query hint.
   At the physical plan level we make use of this information to decide if we 
can push through the topn key operator. We also need to get the origins of the 
columns (in the order by) to see if they're coming from the child table.
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456166)
Remaining Estimate: 0h
Time Spent: 10m

> Pushing TopN Key operator PKFK inner joins
> --
>
> Key: HIVE-23817
> URL: https://issues.apache.org/jira/browse/HIVE-23817
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If there is primary key foreign key relationship between the tables we can 
> push the topnkey operator through the join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23808) "MSCK REPAIR.. DROP Partitions fail" with kryo Exception

2020-07-08 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-23808:
--

Assignee: Antal Sinkovits

> "MSCK REPAIR.. DROP Partitions fail" with kryo Exception 
> -
>
> Key: HIVE-23808
> URL: https://issues.apache.org/jira/browse/HIVE-23808
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.2.0
>Reporter: Rajkumar Singh
>Assignee: Antal Sinkovits
>Priority: Major
>
> Steps to the repo:
> 1. Create External partition table
> 2. Remove some partition manually be using hdfs dfs -rm command
> 3. run "MSCK REPAIR.. DROP Partitions" and it will fail with following 
> exception
> {code:java}
> 2020-07-06 10:42:11,434 WARN  
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork:
>  [HiveServer2-Background-Pool: Thread-210]: Exception thrown while processing 
> using a batch size 2
> org.apache.hadoop.hive.metastore.utils.MetastoreException: 
> MetaException(message:Index: 117, Size: 0)
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:479) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck$2.execute(Msck.java:432) 
> ~[hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork.run(RetryUtilities.java:91)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.metastore.Msck.dropPartitionsInBatches(Msck.java:496) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.metastore.Msck.repair(Msck.java:223) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.ddl.misc.msck.MsckOperation.execute(MsckOperation.java:74)
>  [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> [hive-exec-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at java.security.AccessController.doPrivileged(Native Method) 
> [?:1.8.0_242]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_242]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  [hadoop-common-3.1.1.7.1.1.0-565.jar:?]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  [hive-service-3.1.3000.7.1.1.0-565.jar:3.1.3000.7.1.1.0-565]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_242]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_242]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_242]
> at 

[jira] [Updated] (HIVE-23816) Concurrent access of metastore dynamic partition registration API resulting in data loss due to HDFS dir deletion

2020-07-08 Thread rameshkrishnan muthusamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rameshkrishnan muthusamy updated HIVE-23816:

Description: 
During the process of partition registration via thrift api we are noticing 
that the HDFS file path associated is being deleted even though the path was 
not created by the same process. 

This results in loss of data in the dir path.  In the below example there are 3 
threads that is trying to create a dir and only one of succeeds in registering 
a partition , resulting the other 2 threads deleting the directory created and 
registered by the original thread. 


hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,307 INFO 
org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379217]: Creating 
directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,308 INFO 
org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-386717]: Creating 
directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,308 INFO 
org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379074]: Creating 
directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,314 INFO 
hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: deleting 
hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,315 INFO 
hive.metastore.hivemetastoressimpl: [pool-5-thread-379217]: deleting 
hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,321 INFO 
org.apache.hadoop.fs.TrashPolicyDefault: [pool-5-thread-386717]: Moved: 
'hdfs://test_path/dt=2020-07-02/hhmm-0850' to trash at: 
hdfs://user/test/.Trash/Current/test/dt=2020-07-02/hhmm=0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,321 INFO 
hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: Moved to trash: 
hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,323 ERROR hive.log: 
[pool-5-thread-379217]: Got exception: java.io.IOException Failed to move to 
trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:java.io.IOException: Failed to move to 
trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,328 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-379217]: 
MetaException(message:Got exception: java.io.IOException Failed to move to 
trash: hdfs://test_path/dt=2020-07-02/hhmm-0850)

 

  was:
During the process of partition registration via thrift api we are noticing 
that the HDFS file path associated is being deleted even though the path was 
not created by the same process. 

This results in loss of data in the dir path. 

 


>  Concurrent access of metastore dynamic partition registration API resulting 
> in data loss due to HDFS dir deletion 
> ---
>
> Key: HIVE-23816
> URL: https://issues.apache.org/jira/browse/HIVE-23816
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: rameshkrishnan muthusamy
>Assignee: rameshkrishnan muthusamy
>Priority: Major
>
> During the process of partition registration via thrift api we are noticing 
> that the HDFS file path associated is being deleted even though the path was 
> not created by the same process. 
> This results in loss of data in the dir path.  In the below example there are 
> 3 threads that is trying to create a dir and only one of succeeds in 
> registering a partition , resulting the other 2 threads deleting the 
> directory created and registered by the original thread. 
> hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,307 INFO 
> org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379217]: Creating 
> directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,308 INFO 
> org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-386717]: Creating 
> directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,308 INFO 
> org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379074]: Creating 
> directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,314 INFO 
> hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: deleting 
> hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-**.41:2020-07-02 08:50:31,315 

[jira] [Work logged] (HIVE-23790) The error message length of 2000 is exceeded for scheduled query

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23790?focusedWorklogId=456144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456144
 ]

ASF GitHub Bot logged work on HIVE-23790:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 13:56
Start Date: 08/Jul/20 13:56
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1211:
URL: https://github.com/apache/hive/pull/1211


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456144)
Time Spent: 40m  (was: 0.5h)

> The error message length of 2000 is exceeded for scheduled query
> 
>
> Key: HIVE-23790
> URL: https://issues.apache.org/jira/browse/HIVE-23790
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> 2020-07-01 08:24:23,916 ERROR org.apache.thrift.server.TThreadPoolServer: 
> [pool-7-thread-189]: Error occurred during processing of message.
> org.datanucleus.exceptions.NucleusUserException: Attempt to store value 
> "FAILED: Execution Error, return code 30045 from 
> org.apache.hadoop.hive.ql.exec.repl.DirCopyTask. Permission denied: 
> user=hive, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:626)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:388)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:229)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1908)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1892)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1851)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1130)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:729)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> " in column ""ERROR_MESSAGE"" that has maximum length of 2000. Please correct 
> your data!
>   at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.fieldmanager.ParameterSetter.storeStringField(ParameterSetter.java:158)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.state.AbstractStateManager.providedStringField(AbstractStateManager.java:1448)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> 

[jira] [Resolved] (HIVE-23790) The error message length of 2000 is exceeded for scheduled query

2020-07-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23790.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

pushed to master. Thank you Jesus for reviewing the changes!

> The error message length of 2000 is exceeded for scheduled query
> 
>
> Key: HIVE-23790
> URL: https://issues.apache.org/jira/browse/HIVE-23790
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> 2020-07-01 08:24:23,916 ERROR org.apache.thrift.server.TThreadPoolServer: 
> [pool-7-thread-189]: Error occurred during processing of message.
> org.datanucleus.exceptions.NucleusUserException: Attempt to store value 
> "FAILED: Execution Error, return code 30045 from 
> org.apache.hadoop.hive.ql.exec.repl.DirCopyTask. Permission denied: 
> user=hive, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:626)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:388)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:229)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1908)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1892)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1851)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1130)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:729)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> " in column ""ERROR_MESSAGE"" that has maximum length of 2000. Please correct 
> your data!
>   at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.fieldmanager.ParameterSetter.storeStringField(ParameterSetter.java:158)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.state.AbstractStateManager.providedStringField(AbstractStateManager.java:1448)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.datanucleus.state.StateManagerImpl.providedStringField(StateManagerImpl.java:120)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideField(MScheduledExecution.java)
>  ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246]
>   at 
> org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideFields(MScheduledExecution.java)
>  ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246]
>   at 
> org.datanucleus.state.StateManagerImpl.provideFields(StateManagerImpl.java:1170)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> 

[jira] [Work started] (HIVE-23790) The error message length of 2000 is exceeded for scheduled query

2020-07-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23790 started by Zoltan Haindrich.
---
> The error message length of 2000 is exceeded for scheduled query
> 
>
> Key: HIVE-23790
> URL: https://issues.apache.org/jira/browse/HIVE-23790
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
> 2020-07-01 08:24:23,916 ERROR org.apache.thrift.server.TThreadPoolServer: 
> [pool-7-thread-189]: Error occurred during processing of message.
> org.datanucleus.exceptions.NucleusUserException: Attempt to store value 
> "FAILED: Execution Error, return code 30045 from 
> org.apache.hadoop.hive.ql.exec.repl.DirCopyTask. Permission denied: 
> user=hive, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:626)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:388)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:229)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1908)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1892)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1851)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1130)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:729)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> " in column ""ERROR_MESSAGE"" that has maximum length of 2000. Please correct 
> your data!
>   at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.fieldmanager.ParameterSetter.storeStringField(ParameterSetter.java:158)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.state.AbstractStateManager.providedStringField(AbstractStateManager.java:1448)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.datanucleus.state.StateManagerImpl.providedStringField(StateManagerImpl.java:120)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideField(MScheduledExecution.java)
>  ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246]
>   at 
> org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideFields(MScheduledExecution.java)
>  ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246]
>   at 
> org.datanucleus.state.StateManagerImpl.provideFields(StateManagerImpl.java:1170)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:326)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> 

[jira] [Assigned] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

2020-07-08 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23817:



> Pushing TopN Key operator PKFK inner joins
> --
>
> Key: HIVE-23817
> URL: https://issues.apache.org/jira/browse/HIVE-23817
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> If there is primary key foreign key relationship between the tables we can 
> push the topnkey operator through the join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23816) Concurrent access of metastore dynamic partition registration API resulting in data loss due to HDFS dir deletion

2020-07-08 Thread rameshkrishnan muthusamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rameshkrishnan muthusamy reassigned HIVE-23816:
---


>  Concurrent access of metastore dynamic partition registration API resulting 
> in data loss due to HDFS dir deletion 
> ---
>
> Key: HIVE-23816
> URL: https://issues.apache.org/jira/browse/HIVE-23816
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: rameshkrishnan muthusamy
>Assignee: rameshkrishnan muthusamy
>Priority: Major
>
> During the process of partition registration via thrift api we are noticing 
> that the HDFS file path associated is being deleted even though the path was 
> not created by the same process. 
> This results in loss of data in the dir path. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?focusedWorklogId=456117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456117
 ]

ASF GitHub Bot logged work on HIVE-23813:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 13:30
Start Date: 08/Jul/20 13:30
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1223:
URL: https://github.com/apache/hive/pull/1223#discussion_r451545094



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ReplicationMetricsMaintTask.java
##
@@ -63,13 +63,12 @@ public void run() {
   if (!MetastoreConf.getBoolVar(conf, ConfVars.SCHEDULED_QUERIES_ENABLED)) 
{

Review comment:
   Metrics are always enabled by default. So didn't want to introduce a new 
config.
   The metric collection depends if the scheduled queries are enabled. If not, 
there is no metric collection for replication as the primary key for the table 
is schedule id.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456117)
Time Spent: 50m  (was: 40m)

> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23813.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23800) Make HiveServer2 oom hook interface

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=456110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456110
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 13:23
Start Date: 08/Jul/20 13:23
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1205:
URL: https://github.com/apache/hive/pull/1205#discussion_r451539918



##
File path: 
service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.server;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.common.JavaUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.Constructor;
+import java.util.ArrayList;
+import java.util.List;
+
+public class HiveServer2OomHookRunner implements Runnable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveServer2OomHookRunner.class);
+  private OomHookContext context;
+  private final List hooks = new 
ArrayList();
+
+  HiveServer2OomHookRunner(HiveServer2 hiveServer2, HiveConf hiveConf) {
+context = new OomHookContext(hiveServer2);
+// The hs2 has not been initialized yet, hiveServer2.getHiveConf() would 
be null
+init(hiveConf);
+  }
+
+  private void init(HiveConf hiveConf) {
+String csHooks = hiveConf.getVar(ConfVars.HIVE_SERVER2_OOM_HOOKS);
+if (!StringUtils.isBlank(csHooks)) {
+  String[] hookClasses = csHooks.split(",");
+  for (String hookClass : hookClasses) {
+try {
+  Class clazz =  JavaUtils.loadClass(hookClass.trim());
+  Constructor ctor = clazz.getDeclaredConstructor();
+  ctor.setAccessible(true);
+  hooks.add((OomHookWithContext)ctor.newInstance());
+} catch (Exception e) {
+  LOG.error("Skip adding oom hook '" + hookClass + "'", e);
+}
+  }
+}
+  }
+
+  @VisibleForTesting
+  public HiveServer2OomHookRunner(HiveConf hiveConf) {
+init(hiveConf);
+  }
+
+  @VisibleForTesting
+  public List getHooks() {
+return hooks;
+  }
+
+  @Override
+  public void run() {
+for (OomHookWithContext hook : hooks) {
+  hook.run(context);
+}
+  }
+
+  public static interface OomHookWithContext {
+public void run(OomHookContext context);
+  }
+
+  public static class OomHookContext {
+private final HiveServer2 hiveServer2;
+public OomHookContext(HiveServer2 hiveServer2) {
+  this.hiveServer2 = hiveServer2;
+}
+public HiveServer2 getHiveServer2() {
+  return hiveServer2;
+}
+  }
+
+  /**
+   * Used as default oom hook
+   */
+  private static class DefaultOomHook implements OomHookWithContext {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456110)
Time Spent: 1h 10m  (was: 1h)

> Make HiveServer2 oom hook interface
> ---
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23800) Make HiveServer2 oom hook interface

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23800?focusedWorklogId=456109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456109
 ]

ASF GitHub Bot logged work on HIVE-23800:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 13:22
Start Date: 08/Jul/20 13:22
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #1205:
URL: https://github.com/apache/hive/pull/1205#discussion_r451539648



##
File path: 
service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.server;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.hive.common.JavaUtils;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.reflect.Constructor;
+import java.util.ArrayList;
+import java.util.List;
+
+public class HiveServer2OomHookRunner implements Runnable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveServer2OomHookRunner.class);
+  private OomHookContext context;
+  private final List hooks = new 
ArrayList();
+
+  HiveServer2OomHookRunner(HiveServer2 hiveServer2, HiveConf hiveConf) {
+context = new OomHookContext(hiveServer2);
+// The hs2 has not been initialized yet, hiveServer2.getHiveConf() would 
be null
+init(hiveConf);
+  }
+
+  private void init(HiveConf hiveConf) {
+String csHooks = hiveConf.getVar(ConfVars.HIVE_SERVER2_OOM_HOOKS);
+if (!StringUtils.isBlank(csHooks)) {
+  String[] hookClasses = csHooks.split(",");
+  for (String hookClass : hookClasses) {
+try {
+  Class clazz =  JavaUtils.loadClass(hookClass.trim());
+  Constructor ctor = clazz.getDeclaredConstructor();
+  ctor.setAccessible(true);
+  hooks.add((OomHookWithContext)ctor.newInstance());
+} catch (Exception e) {
+  LOG.error("Skip adding oom hook '" + hookClass + "'", e);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456109)
Time Spent: 1h  (was: 50m)

> Make HiveServer2 oom hook interface
> ---
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153599#comment-17153599
 ] 

Aasha Medhi commented on HIVE-23813:


http://ci.hive.apache.org/job/hive-flaky-check/67/


> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23813.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22957?focusedWorklogId=456097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456097
 ]

ASF GitHub Bot logged work on HIVE-22957:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 12:59
Start Date: 08/Jul/20 12:59
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1105:
URL: https://github.com/apache/hive/pull/1105#discussion_r451502201



##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
##
@@ -734,6 +734,21 @@ dropPartitionOperator
 EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO | 
GREATERTHAN
 ;
 
+filterPartitionSpec
+:
+LPAREN filterPartitionVal (COMMA  filterPartitionVal )* RPAREN -> 
^(TOK_PARTSPEC filterPartitionVal +)
+;
+
+filterPartitionVal
+:
+identifier filterPartitionOperator constant -> ^(TOK_PARTVAL identifier 
filterPartitionOperator constant)

Review comment:
   old `partitionSpec` doesn't mandatorily required the constant
   ```
   identifier (EQUAL constant)? 
   ```
   
   were there any use cases of that?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -383,7 +375,29 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 // now check the table folder and see if we find anything
 // that isn't in the metastore
 Set allPartDirs = new HashSet();
+Set partDirs = new HashSet();
+List partColumns = table.getPartitionKeys();
 checkPartitionDirs(tablePath, allPartDirs, 
Collections.unmodifiableList(getPartColNames(table)));
+
+if (filterExp != null) {
+  PartitionExpressionProxy expressionProxy = createExpressionProxy(conf);
+  List paritions = new ArrayList<>();
+  for (Path path : allPartDirs) {
+// remove the table's path from the partition path
+// eg: /p1=1/p2=2/p3=3 ---> p1=1/p2=2/p3=3
+paritions.add(path.toString().substring(tablePath.toString().length() 
+ 1));
+  }
+  // Remove all partition paths which does not matches the filter 
expression.
+  expressionProxy.filterPartitionsByExpr(partColumns, filterExp,
+  conf.get(MetastoreConf.ConfVars.DEFAULTPARTITIONNAME.getVarname()), 
paritions);
+
+  // now the partition list will contain all the paths that matches the 
filter expression.
+  // add them back to partDirs.
+  for (String path : paritions) {
+partDirs.add(new Path(tablePath.toString() + "/" + path));

Review comment:
   instead of concatenating with `/` use `new Path(parentPath,child)` - 
it's more portable

##
File path: itests/src/test/resources/testconfiguration.properties
##
@@ -222,6 +222,7 @@ mr.query.files=\
   mapjoin_subquery2.q,\
   mapjoin_test_outer.q,\
   masking_5.q,\
+  msck_repair_filter.q,\

Review comment:
   is there a reason that we run this test with mr?

##
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
##
@@ -1942,9 +1942,8 @@ metastoreCheck
 @after { popMsg(state); }
 : KW_MSCK (repair=KW_REPAIR)?
   (KW_TABLE tableName
-((add=KW_ADD | drop=KW_DROP | sync=KW_SYNC) (parts=KW_PARTITIONS))? |
-(partitionSpec)?)
--> ^(TOK_MSCK $repair? tableName? $add? $drop? $sync? (partitionSpec*)?)
+((add=KW_ADD | drop=KW_DROP | sync=KW_SYNC) (parts=KW_PARTITIONS) 
(filterPartitionSpec)?)?)
+-> ^(TOK_MSCK $repair? tableName? $add? $drop? $sync? 
(filterPartitionSpec)?)

Review comment:
   I know it was here before - but let's fix this up:
   
   instead of separate add/drop/sync variable ...we could have 
`opt=(KW_ADD|KW_DROP|KW_SYNC)` ? that will make the other end more readable as 
well

##
File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java
##
@@ -63,13 +67,24 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
 }
 
 Table table = getTable(tableName);
-List> specs = getPartitionSpecs(table, root);
+Map> partitionSpecs = 
getFullPartitionSpecs(root, table, conf, false);
+byte[] filterExp = null;
+if (partitionSpecs != null & !partitionSpecs.isEmpty()) {
+  // explicitly set expression proxy class to 
PartitionExpressionForMetastore since we intend to use the
+  // filterPartitionsByExpr of PartitionExpressionForMetastore for 
partition pruning down the line.
+  conf.set(MetastoreConf.ConfVars.EXPRESSION_PROXY_CLASS.getVarname(),

Review comment:
   I don't think this will work - this is the ql module ; while 
`EXPRESSION_PROXY_CLASS` is a metastore conf key; in a remote metastore setup 
this set will probably have no effect...
   have you tried it?
   I think making a check and returning with an error that this feature is not 
available due to 

[jira] [Updated] (HIVE-23815) output statistics of underlying datastore

2020-07-08 Thread Rossetti Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rossetti Wong updated HIVE-23815:
-
External issue URL:   (was: https://github.com/apache/hive/pull/1226)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23815) output statistics of underlying datastore

2020-07-08 Thread Rossetti Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rossetti Wong updated HIVE-23815:
-
Description: This patch provides a way to get the statistics data of 
metastore's underlying datastore, like MySQL, Oracle and so on.  You can get 
the number of datastore reads and writes, the average time of transaction 
execution, the total active connection and so on.  (was: This patch provides a 
way to get the statistics data of metastore's underlying datastore, like MySQL, 
Oracl and so on.  You can get the number of datastore reads and writes, the 
average time of transaction execution, the total active connection and so on.)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=456093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456093
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 12:49
Start Date: 08/Jul/20 12:49
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 opened a new pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456093)
Time Spent: 20m  (was: 10m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracl and so on.  You can get the number of 
> datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23815) output statistics of underlying datastore

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23815:
--
Labels: pull-request-available  (was: )

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracl and so on.  You can get the number of 
> datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=456092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456092
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 12:47
Start Date: 08/Jul/20 12:47
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 closed pull request #1226:
URL: https://github.com/apache/hive/pull/1226


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456092)
Remaining Estimate: 0h
Time Spent: 10m

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracl and so on.  You can get the number of 
> datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23815) output statistics of underlying datastore

2020-07-08 Thread Rossetti Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rossetti Wong reassigned HIVE-23815:



> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracl and so on.  You can get the number of 
> datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?focusedWorklogId=456067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456067
 ]

ASF GitHub Bot logged work on HIVE-23813:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 12:11
Start Date: 08/Jul/20 12:11
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1223:
URL: https://github.com/apache/hive/pull/1223#discussion_r451486233



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ReplicationMetricsMaintTask.java
##
@@ -63,13 +63,12 @@ public void run() {
   if (!MetastoreConf.getBoolVar(conf, ConfVars.SCHEDULED_QUERIES_ENABLED)) 
{

Review comment:
   please correct the comment in `initialDelay` method as well

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ReplicationMetricsMaintTask.java
##
@@ -63,13 +63,12 @@ public void run() {
   if (!MetastoreConf.getBoolVar(conf, ConfVars.SCHEDULED_QUERIES_ENABLED)) 
{

Review comment:
   I think this class should depend on a different config knob

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/MetricSink.java
##
@@ -121,6 +121,7 @@ public void run() {
 ObjectMapper mapper = new ObjectMapper();

Review comment:
   is this method supposed to be fast? (because `ArrayList` was created for 
a specified size)
   ...anyway try not to throw away `ObjectMapper` instances right after use - 
`ObjectMapper`'s first time use cost could be high

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ReplicationMetricsMaintTask.java
##
@@ -63,13 +63,12 @@ public void run() {
   if (!MetastoreConf.getBoolVar(conf, ConfVars.SCHEDULED_QUERIES_ENABLED)) 
{
 return;
   }
-  LOG.debug("Cleaning up older Metrics");
   RawStore ms = HiveMetaStore.HMSHandler.getMSForConf(conf);
-  int maxRetainSecs = (int) 
TimeUnit.DAYS.toSeconds(MetastoreConf.getTimeVar(conf,
-ConfVars.REPL_METRICS_MAX_AGE, TimeUnit.DAYS));
+  int maxRetainSecs = (int) MetastoreConf.getTimeVar(conf, 
ConfVars.REPL_METRICS_MAX_AGE, TimeUnit.SECONDS);
+  LOG.info("Cleaning up Metrics older than {} ", maxRetainSecs);
   int deleteCnt = ms.deleteReplicationMetrics(maxRetainSecs);
   if (deleteCnt > 0L){

Review comment:
   nit: space before `{`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456067)
Time Spent: 40m  (was: 0.5h)

> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23813.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=456064=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456064
 ]

ASF GitHub Bot logged work on HIVE-23638:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 12:04
Start Date: 08/Jul/20 12:04
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1161:
URL: https://github.com/apache/hive/pull/1161#discussion_r451489380



##
File path: common/src/java/org/apache/hive/common/util/SuppressFBWarnings.java
##
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.common.util;
+
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+
+@Retention(RetentionPolicy.CLASS)
+public @interface SuppressFBWarnings {
+/**
+ * The set of FindBugs warnings that are to be suppressed in
+ * annotated element. The value can be a bug category, kind or pattern.
+ *
+ */
+String[] value() default {};
+
+/**
+ * Optional documentation of the reason why the warning is suppressed
+ */
+String justification() default "";
+}

Review comment:
   Sure, done :) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456064)
Time Spent: 1h 40m  (was: 1.5h)

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=456062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456062
 ]

ASF GitHub Bot logged work on HIVE-23638:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 12:03
Start Date: 08/Jul/20 12:03
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1161:
URL: https://github.com/apache/hive/pull/1161#discussion_r451488767



##
File path: common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java
##
@@ -135,10 +135,10 @@ public static Path internUriStringsInPath(Path path) {
 
   public static  Map internValuesInMap(Map map) {
 if (map != null) {
-  for (K key : map.keySet()) {
-String value = map.get(key);
+  for (Map.Entry entry : map.entrySet()) {
+String value = entry.getValue();
 if (value != null) {
-  map.put(key, value.intern());
+  map.put(entry.getKey(), value.intern());

Review comment:
   Nice idea! I followed similar logic to check if values are already 
interned in all helper methods in StringInternUtils class





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456062)
Time Spent: 1h 20m  (was: 1h 10m)

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=456063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456063
 ]

ASF GitHub Bot logged work on HIVE-23638:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 12:03
Start Date: 08/Jul/20 12:03
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1161:
URL: https://github.com/apache/hive/pull/1161#discussion_r451489258



##
File path: common/src/java/org/apache/hadoop/hive/conf/Validator.java
##
@@ -357,14 +357,15 @@ public String validate(String value) {
   final Path path = FileSystems.getDefault().getPath(value);
   if (path == null && value != null) {
 return String.format("Path '%s' provided could not be located.", 
value);
-  }
-  final boolean isDir = Files.isDirectory(path);
-  final boolean isWritable = Files.isWritable(path);
-  if (!isDir) {
-return String.format("Path '%s' provided is not a directory.", value);
-  }
-  if (!isWritable) {
-return String.format("Path '%s' provided is not writable.", value);
+  } else if (path != null) {
+final boolean isDir = Files.isDirectory(path);
+final boolean isWritable = Files.isWritable(path);
+if (!isDir) {
+  return String.format("Path '%s' provided is not a directory.", 
value);
+}
+if (!isWritable) {
+  return String.format("Path '%s' provided is not writable.", value);
+}
   }
   return null;

Review comment:
   Refactored the code to return early when the argument is actually null, 
the following logic is now simplified to Null and non null path





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456063)
Time Spent: 1.5h  (was: 1h 20m)

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=456061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456061
 ]

ASF GitHub Bot logged work on HIVE-23638:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 12:00
Start Date: 08/Jul/20 12:00
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1161:
URL: https://github.com/apache/hive/pull/1161#discussion_r451487709



##
File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java
##
@@ -926,8 +925,7 @@ public static File createLocalDirsTempFile(Configuration 
conf, String prefix, St
* delete a temporary file and remove it from delete-on-exit hook.
*/
   public static boolean deleteTmpFile(File tempFile) {
-if (tempFile != null) {
-  tempFile.delete();
+if (tempFile != null && tempFile.delete()) {

Review comment:
   Good catch, fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456061)
Time Spent: 1h 10m  (was: 1h)

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23069 started by Pravin Sinha.
---
> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23069:

Description: 
Currently the iterator used while copying table data is memory based. In case 
of a database with very large number of table/partitions, such iterator may 
cause HS2 process to go OOM.

Also introduces a config option to run data copy tasks during repl load 
operation.

  was:Currently the iterator used while copying table data is memory based. In 
case of a database with very large number of table/partitions, such iterator 
may cause HS2 process to go OOM.


> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23069:

Status: Patch Available  (was: In Progress)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-08 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23069:

Attachment: HIVE-23069.01.patch

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23069:
--
Labels: pull-request-available  (was: )

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=456054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456054
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 11:48
Start Date: 08/Jul/20 11:48
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1225:
URL: https://github.com/apache/hive/pull/1225


   …n. Config option to execute data copy during load.
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456054)
Remaining Estimate: 0h
Time Spent: 10m

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-07-08 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153523#comment-17153523
 ] 

Syed Shameerur Rahman commented on HIVE-22957:
--

[~jcamachorodriguez] [~kgyrtkirk] ping for review request!

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR 
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch, HIVE-22957.03.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23805) ValidReadTxnList need not be constructed multiple times in AcidUtils::getAcidState

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23805?focusedWorklogId=456036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456036
 ]

ASF GitHub Bot logged work on HIVE-23805:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 11:29
Start Date: 08/Jul/20 11:29
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1224:
URL: https://github.com/apache/hive/pull/1224#discussion_r451472066



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
##
@@ -730,7 +729,10 @@ public boolean validateInput(FileSystem fs, HiveConf conf,
   ? AcidOperationalProperties.parseString(txnProperties) : null;
 
   String value = conf.get(ValidWriteIdList.VALID_WRITEIDS_KEY);
-  writeIdList = value == null ? new ValidReaderWriteIdList() : new 
ValidReaderWriteIdList(value);
+  writeIdList = new ValidReaderWriteIdList(value);
+
+  value = conf.get(ValidTxnList.VALID_TXNS_KEY);
+  validTxnList = new ValidReadTxnList(value);

Review comment:
   Will this help, if we have multiple partitions with multiple files to 
parse the TxnList only once?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456036)
Time Spent: 40m  (was: 0.5h)

> ValidReadTxnList need not be constructed multiple times in 
> AcidUtils::getAcidState 
> ---
>
> Key: HIVE-23805
> URL: https://issues.apache.org/jira/browse/HIVE-23805
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-07-06 at 4.53.44 PM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1273]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1286]
>  
> {code:java}
> String s = conf.get(ValidTxnList.VALID_TXNS_KEY);
>   
>   
> if(!Strings.isNullOrEmpty(s)) {
>   
>  ...
>  ...
>   validTxnList.readFromString(s);
>   
>   
> } {code}
>  
>  
> !Screenshot 2020-07-06 at 4.53.44 PM.png|width=610,height=621!
> AM spends good amount of CPU parsing the same validtxnlist multiple times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23805) ValidReadTxnList need not be constructed multiple times in AcidUtils::getAcidState

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23805?focusedWorklogId=456031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-456031
 ]

ASF GitHub Bot logged work on HIVE-23805:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 11:26
Start Date: 08/Jul/20 11:26
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1224:
URL: https://github.com/apache/hive/pull/1224#discussion_r451470405



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -1262,8 +1262,8 @@ public static boolean isAcid(FileSystem fileSystem, Path 
directory,
* @throws IOException on filesystem errors
*/
   public static Directory getAcidState(FileSystem fileSystem, Path 
candidateDirectory, Configuration conf,
-  ValidWriteIdList writeIdList, Ref useFileIds, boolean 
ignoreEmptyFiles) throws IOException {
-return getAcidState(fileSystem, candidateDirectory, conf, writeIdList, 
useFileIds, ignoreEmptyFiles, null);
+  ValidWriteIdList writeIdList, ValidTxnList validTxnList, Ref 
useFileIds, boolean ignoreEmptyFiles) throws IOException {
+return getAcidState(fileSystem, candidateDirectory, conf, writeIdList, 
validTxnList, useFileIds, ignoreEmptyFiles, null);

Review comment:
   Maybe create a separate class for AcidState?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 456031)
Time Spent: 0.5h  (was: 20m)

> ValidReadTxnList need not be constructed multiple times in 
> AcidUtils::getAcidState 
> ---
>
> Key: HIVE-23805
> URL: https://issues.apache.org/jira/browse/HIVE-23805
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-07-06 at 4.53.44 PM.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1273]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1286]
>  
> {code:java}
> String s = conf.get(ValidTxnList.VALID_TXNS_KEY);
>   
>   
> if(!Strings.isNullOrEmpty(s)) {
>   
>  ...
>  ...
>   validTxnList.readFromString(s);
>   
>   
> } {code}
>  
>  
> !Screenshot 2020-07-06 at 4.53.44 PM.png|width=610,height=621!
> AM spends good amount of CPU parsing the same validtxnlist multiple times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23805) ValidReadTxnList need not be constructed multiple times in AcidUtils::getAcidState

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23805?focusedWorklogId=455998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455998
 ]

ASF GitHub Bot logged work on HIVE-23805:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 10:36
Start Date: 08/Jul/20 10:36
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1224:
URL: https://github.com/apache/hive/pull/1224#discussion_r451446059



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -1262,8 +1262,8 @@ public static boolean isAcid(FileSystem fileSystem, Path 
directory,
* @throws IOException on filesystem errors
*/
   public static Directory getAcidState(FileSystem fileSystem, Path 
candidateDirectory, Configuration conf,
-  ValidWriteIdList writeIdList, Ref useFileIds, boolean 
ignoreEmptyFiles) throws IOException {
-return getAcidState(fileSystem, candidateDirectory, conf, writeIdList, 
useFileIds, ignoreEmptyFiles, null);
+  ValidWriteIdList writeIdList, ValidTxnList validTxnList, Ref 
useFileIds, boolean ignoreEmptyFiles) throws IOException {
+return getAcidState(fileSystem, candidateDirectory, conf, writeIdList, 
validTxnList, useFileIds, ignoreEmptyFiles, null);

Review comment:
   I will create a separate issue to change getAcidState to get just one 
parameter with builder pattern, because this is getting out of hand





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 455998)
Time Spent: 20m  (was: 10m)

> ValidReadTxnList need not be constructed multiple times in 
> AcidUtils::getAcidState 
> ---
>
> Key: HIVE-23805
> URL: https://issues.apache.org/jira/browse/HIVE-23805
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-07-06 at 4.53.44 PM.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1273]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1286]
>  
> {code:java}
> String s = conf.get(ValidTxnList.VALID_TXNS_KEY);
>   
>   
> if(!Strings.isNullOrEmpty(s)) {
>   
>  ...
>  ...
>   validTxnList.readFromString(s);
>   
>   
> } {code}
>  
>  
> !Screenshot 2020-07-06 at 4.53.44 PM.png|width=610,height=621!
> AM spends good amount of CPU parsing the same validtxnlist multiple times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23805) ValidReadTxnList need not be constructed multiple times in AcidUtils::getAcidState

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23805:
--
Labels: pull-request-available  (was: )

> ValidReadTxnList need not be constructed multiple times in 
> AcidUtils::getAcidState 
> ---
>
> Key: HIVE-23805
> URL: https://issues.apache.org/jira/browse/HIVE-23805
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-07-06 at 4.53.44 PM.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1273]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1286]
>  
> {code:java}
> String s = conf.get(ValidTxnList.VALID_TXNS_KEY);
>   
>   
> if(!Strings.isNullOrEmpty(s)) {
>   
>  ...
>  ...
>   validTxnList.readFromString(s);
>   
>   
> } {code}
>  
>  
> !Screenshot 2020-07-06 at 4.53.44 PM.png|width=610,height=621!
> AM spends good amount of CPU parsing the same validtxnlist multiple times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23805) ValidReadTxnList need not be constructed multiple times in AcidUtils::getAcidState

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23805?focusedWorklogId=455997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455997
 ]

ASF GitHub Bot logged work on HIVE-23805:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 10:35
Start Date: 08/Jul/20 10:35
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1224:
URL: https://github.com/apache/hive/pull/1224


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 455997)
Remaining Estimate: 0h
Time Spent: 10m

> ValidReadTxnList need not be constructed multiple times in 
> AcidUtils::getAcidState 
> ---
>
> Key: HIVE-23805
> URL: https://issues.apache.org/jira/browse/HIVE-23805
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Peter Varga
>Priority: Major
> Attachments: Screenshot 2020-07-06 at 4.53.44 PM.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1273]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1286]
>  
> {code:java}
> String s = conf.get(ValidTxnList.VALID_TXNS_KEY);
>   
>   
> if(!Strings.isNullOrEmpty(s)) {
>   
>  ...
>  ...
>   validTxnList.readFromString(s);
>   
>   
> } {code}
>  
>  
> !Screenshot 2020-07-06 at 4.53.44 PM.png|width=610,height=621!
> AM spends good amount of CPU parsing the same validtxnlist multiple times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23805) ValidReadTxnList need not be constructed multiple times in AcidUtils::getAcidState

2020-07-08 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-23805:
--

Assignee: Peter Varga

> ValidReadTxnList need not be constructed multiple times in 
> AcidUtils::getAcidState 
> ---
>
> Key: HIVE-23805
> URL: https://issues.apache.org/jira/browse/HIVE-23805
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Peter Varga
>Priority: Major
> Attachments: Screenshot 2020-07-06 at 4.53.44 PM.png
>
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1273]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1286]
>  
> {code:java}
> String s = conf.get(ValidTxnList.VALID_TXNS_KEY);
>   
>   
> if(!Strings.isNullOrEmpty(s)) {
>   
>  ...
>  ...
>   validTxnList.readFromString(s);
>   
>   
> } {code}
>  
>  
> !Screenshot 2020-07-06 at 4.53.44 PM.png|width=610,height=621!
> AM spends good amount of CPU parsing the same validtxnlist multiple times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?focusedWorklogId=455996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455996
 ]

ASF GitHub Bot logged work on HIVE-23813:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 10:33
Start Date: 08/Jul/20 10:33
Worklog Time Spent: 10m 
  Work Description: aasha commented on pull request #1223:
URL: https://github.com/apache/hive/pull/1223#issuecomment-655435804


   > @aasha: Could you please verify that the flaky tests are fixed with 
running the flaky check tester jenkins 
job?http://ci.hive.apache.org/job/hive-flaky-check/
   > 
   > Thanks,
   > Peter
   
   Yes already triggered that. 23 run and all good till now. Will monitor that. 
http://ci.hive.apache.org/job/hive-flaky-check/67/console



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 455996)
Time Spent: 0.5h  (was: 20m)

> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23813.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?focusedWorklogId=455964=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455964
 ]

ASF GitHub Bot logged work on HIVE-23813:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 09:50
Start Date: 08/Jul/20 09:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1223:
URL: https://github.com/apache/hive/pull/1223#issuecomment-655415683


   @aasha: Could you please verify that the flaky tests are fixed with running 
the flaky check tester jenkins 
job?http://ci.hive.apache.org/job/hive-flaky-check/
   
   Thanks,
   Peter



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 455964)
Time Spent: 20m  (was: 10m)

> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23813.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-20441) NPE in ExprNodeGenericFuncDesc when hive.allow.udf.load.on.demand is set to true

2020-07-08 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153437#comment-17153437
 ] 

Zhihua Deng commented on HIVE-20441:


...The problem may still be there in the trunk,  [~BIGrey] are you still 
working on this ?

> NPE in ExprNodeGenericFuncDesc  when hive.allow.udf.load.on.demand is set to 
> true
> -
>
> Key: HIVE-20441
> URL: https://issues.apache.org/jira/browse/HIVE-20441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, HiveServer2
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Attachments: HIVE-20441.1.patch, HIVE-20441.2.patch, 
> HIVE-20441.3.patch, HIVE-20441.4.patch, HIVE-20441.patch
>
>
> When hive.allow.udf.load.on.demand is set to true and hiveserver2 has been 
> started, the new created function from other clients or hiveserver2 will be 
> loaded from the metastore at the first time. 
> When the udf is used in where clause, we got a NPE like:
> {code:java}
> Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAP
> SHOT]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHO
> T]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:542)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNA
> PSHOT]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
>  ~[hive-service-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1104)
>  ~[hive-exec-2.
> 3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1359)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.
> 3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-2.3.4-SNAPSHOT.jar:2.3.4-SNAPSHOT]
> at 
> 

[jira] [Work logged] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?focusedWorklogId=455928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455928
 ]

ASF GitHub Bot logged work on HIVE-23813:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 08:43
Start Date: 08/Jul/20 08:43
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #1223:
URL: https://github.com/apache/hive/pull/1223


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 455928)
Remaining Estimate: 0h
Time Spent: 10m

> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23813.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23813:
--
Labels: pull-request-available  (was: )

> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23813.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23813 started by Aasha Medhi.
--
> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23813.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23813:
---
Attachment: HIVE-23813.01.patch
Status: Patch Available  (was: In Progress)

> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23813.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=455917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-455917
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 08/Jul/20 08:23
Start Date: 08/Jul/20 08:23
Worklog Time Spent: 10m 
  Work Description: miklosgergely opened a new pull request #1222:
URL: https://github.com/apache/hive/pull/1222


   Driver is now cut down to it's minimal size by extracting all of it's sub 
tasks to separate classes. The rest should be cleaned up by
   
   - moving out some smaller parts of the code to sub task and utility classes 
wherever it is still possible
   - cut large functions to meaningful and manageable parts
   - re-order the functions to follow the order of processing
   - fix checkstyle issues



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 455917)
Remaining Estimate: 0h
Time Spent: 10m

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23814) Clean up Driver

2020-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23814:
--
Labels: pull-request-available  (was: )

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23814) Clean up Driver

2020-07-08 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely reassigned HIVE-23814:
-


> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * cut large functions to meaningful and manageable parts
>  * re-order the functions to follow the order of processing
>  * fix checkstyle issues
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23813) Fix Flaky tests due to JDO ConnectionException

2020-07-08 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-23813:
--


> Fix Flaky tests due to JDO ConnectionException
> --
>
> Key: HIVE-23813
> URL: https://issues.apache.org/jira/browse/HIVE-23813
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23762) TestPigHBaseStorageHandler tests are flaky

2020-07-08 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153342#comment-17153342
 ] 

Aasha Medhi commented on HIVE-23762:


will be fixed as part of https://issues.apache.org/jira/browse/HIVE-23813

> TestPigHBaseStorageHandler tests are flaky
> --
>
> Key: HIVE-23762
> URL: https://issues.apache.org/jira/browse/HIVE-23762
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Most likely caused by HIVE-23668



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-22247) HiveHFileOutputFormat throws FileNotFoundException when partition's task output empty

2020-07-08 Thread xiepengjie (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22247 started by xiepengjie.
-
> HiveHFileOutputFormat throws FileNotFoundException when partition's task 
> output empty
> -
>
> Key: HIVE-22247
> URL: https://issues.apache.org/jira/browse/HIVE-22247
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.2.0, 3.0.0
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>
> When partition's task output empty, HiveHFileOutputFormat throws 
> FileNotFoundException like this:
> {code:java}
> 2019-09-24 19:15:55,886 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: 1 finished. closing... 
> 2019-09-24 19:15:55,886 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 0
> 2019-09-24 19:15:55,886 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS 
> hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/02_0
> 2019-09-24 19:15:55,886 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS 
> hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.02_0
> 2019-09-24 19:15:55,886 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS 
> hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/02_0
> 2019-09-24 19:15:55,915 INFO [main] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output 
> Committer Algorithm version is 1
> 2019-09-24 19:15:55,954 INFO [main] 
> org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is 
> deprecated. Instead, use io.native.lib.available
> 2019-09-24 19:15:56,089 ERROR [main] ExecReducer: Hit error while closing 
> operators - failing tree
> 2019-09-24 19:15:56,090 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators: java.io.FileNotFoundException: File 
> hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.02_0
>  does not exist.
>   at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
>   at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.FileNotFoundException: File 
> hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.02_0
>  does not exist.
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:200)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1016)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278)
>   ... 7 more
> Caused by: java.io.FileNotFoundException: File 
> hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.02_0
>  does not exist.
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:880)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:109)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:938)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:934)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:945)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1592)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1632)
>   at 
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:153)
>   at 
>