[jira] [Commented] (HIVE-25239) Create the compression table but the properties Compressed is No

2021-06-10 Thread XixiHua (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361395#comment-17361395
 ] 

XixiHua commented on HIVE-25239:


yes, this is to be fixed, the ticket is not closed.

At the Hive-2250 ticket, the meaning of compressed is confused. I think this is 
why this bug is not fixed.

> Create the compression table but the properties Compressed is No
> 
>
> Key: HIVE-25239
> URL: https://issues.apache.org/jira/browse/HIVE-25239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: GuangMing Lu
>Priority: Major
>  Labels: easyfix
> Fix For: 4.0.0
>
> Attachments: HIVE-25239.01.patch, image-2021-06-11-10-49-25-710.png
>
>
> Create an ORC Snappy format table, call 'desc formatted table' found that 
> 'Compressed' is No, should need to display as YES
> {quote}create database lgm;
> create table lgm.test_tbl(
>  f1 int,
>  f2 string
> ) stored as orc
> TBLPROPERTIES("orc.compress"="snappy");
> desc formatted lgm.test_tbl;
> !image-2021-06-11-10-49-25-710.png!
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24969) Predicates may be removed when decorrelating subqueries with lateral

2021-06-10 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-24969:
---
Summary: Predicates may be removed when decorrelating subqueries with 
lateral  (was: Predicates are removed by PPD when left semi join followed by 
lateral view)

> Predicates may be removed when decorrelating subqueries with lateral
> 
>
> Key: HIVE-24969
> URL: https://issues.apache.org/jira/browse/HIVE-24969
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Step to reproduce:
> {code:java}
> select count(distinct logItem.triggerId)
> from service_stat_log LATERAL VIEW explode(logItems) LogItemTable AS logItem
> where logItem.dsp in ('delivery', 'ocpa')
> and logItem.iswin = true
> and logItem.adid in (
>  select distinct adId
>  from ad_info
>  where subAccountId in (16010, 14863));  {code}
> For predicates _logItem.dsp in ('delivery', 'ocpa')_  and _logItem.iswin = 
> true_ are removed when doing ppd: JOIN ->   RS  -> LVJ.  The JOIN has 
> candicates: logitem -> [logItem.dsp in ('delivery', 'ocpa'), logItem.iswin = 
> true],when pushing them to the RS followed by LVJ,  none of them are pushed, 
> the candicates of logitem are removed finally by default, which cause to the 
> wrong result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25239) Create the compression table but the properties Compressed is No

2021-06-10 Thread GuangMing Lu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361387#comment-17361387
 ] 

GuangMing Lu commented on HIVE-25239:
-

Hi 
[XixiHua|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=honeyaya], 
It's not be solved, you can test in master branch. Compressed should be marked 
as the attribute value when building a table

> Create the compression table but the properties Compressed is No
> 
>
> Key: HIVE-25239
> URL: https://issues.apache.org/jira/browse/HIVE-25239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: GuangMing Lu
>Priority: Major
>  Labels: easyfix
> Fix For: 4.0.0
>
> Attachments: HIVE-25239.01.patch, image-2021-06-11-10-49-25-710.png
>
>
> Create an ORC Snappy format table, call 'desc formatted table' found that 
> 'Compressed' is No, should need to display as YES
> {quote}create database lgm;
> create table lgm.test_tbl(
>  f1 int,
>  f2 string
> ) stored as orc
> TBLPROPERTIES("orc.compress"="snappy");
> desc formatted lgm.test_tbl;
> !image-2021-06-11-10-49-25-710.png!
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25239) Create the compression table but the properties Compressed is No

2021-06-10 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu reassigned HIVE-25239:
---

Assignee: (was: GuangMing Lu)

> Create the compression table but the properties Compressed is No
> 
>
> Key: HIVE-25239
> URL: https://issues.apache.org/jira/browse/HIVE-25239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: GuangMing Lu
>Priority: Major
>  Labels: easyfix
> Fix For: 4.0.0
>
> Attachments: HIVE-25239.01.patch, image-2021-06-11-10-49-25-710.png
>
>
> Create an ORC Snappy format table, call 'desc formatted table' found that 
> 'Compressed' is No, should need to display as YES
> {quote}create database lgm;
> create table lgm.test_tbl(
>  f1 int,
>  f2 string
> ) stored as orc
> TBLPROPERTIES("orc.compress"="snappy");
> desc formatted lgm.test_tbl;
> !image-2021-06-11-10-49-25-710.png!
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25239) Create the compression table but the properties Compressed is No

2021-06-10 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25239:

   Attachment: HIVE-25239.01.patch
Fix Version/s: 4.0.0
 Assignee: GuangMing Lu
   Status: Patch Available  (was: Open)

> Create the compression table but the properties Compressed is No
> 
>
> Key: HIVE-25239
> URL: https://issues.apache.org/jira/browse/HIVE-25239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: GuangMing Lu
>Assignee: GuangMing Lu
>Priority: Major
>  Labels: easyfix
> Fix For: 4.0.0
>
> Attachments: HIVE-25239.01.patch, image-2021-06-11-10-49-25-710.png
>
>
> Create an ORC Snappy format table, call 'desc formatted table' found that 
> 'Compressed' is No, should need to display as YES
> {quote}create database lgm;
> create table lgm.test_tbl(
>  f1 int,
>  f2 string
> ) stored as orc
> TBLPROPERTIES("orc.compress"="snappy");
> desc formatted lgm.test_tbl;
> !image-2021-06-11-10-49-25-710.png!
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-2250) "DESCRIBE EXTENDED table_name" shows inconsistent compression information.

2021-06-10 Thread XixiHua (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361377#comment-17361377
 ] 

XixiHua commented on HIVE-2250:
---

this questions till be confused by Hive users, 
https://issues.apache.org/jira/browse/HIVE-25239

how about we not print it in the describe extended/formatted output? 
[~qwertymaniac] [~Yibing] [~viktor.gerdin]

> "DESCRIBE EXTENDED table_name" shows inconsistent compression information.
> --
>
> Key: HIVE-2250
> URL: https://issues.apache.org/jira/browse/HIVE-2250
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Diagnosability
>Affects Versions: 0.7.0
> Environment: RHEL, Full Cloudera stack
>Reporter: Travis Powell
>Assignee: subramanian raghunathan
>Priority: Critical
> Attachments: HIVE-2250.patch
>
>
> Commands executed in this order:
> user@node # hive
> hive> SET hive.exec.compress.output=true; 
> hive> SET io.seqfile.compression.type=BLOCK;
> hive> CREATE TABLE table_name ( [...] ) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY '\t' STORED AS SEQUENCEFILE;
> hive> CREATE TABLE staging_table ( [...] ) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY '\t';
> hive> LOAD DATA LOCAL INPATH 'file:///root/input/' OVERWRITE INTO TABLE 
> staging_table;
> hive> INSERT OVERWRITE TABLE table_name SELECT * FROM staging_table;
> (Map reduce job to change to sequence file...)
> hive> DESCRIBE EXTENDED table_name;
> Detailed Table Information  Table(tableName:table_name, 
> dbName:benchmarking, owner:root, createTime:1309480053, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:session_key, 
> type:string, comment:null), FieldSchema(name:remote_address, type:string, 
> comment:null), FieldSchema(name:canister_lssn, type:string, comment:null), 
> FieldSchema(name:canister_session_id, type:bigint, comment:null), 
> FieldSchema(name:tltsid, type:string, comment:null), FieldSchema(name:tltuid, 
> type:string, comment:null), FieldSchema(name:tltvid, type:string, 
> comment:null), FieldSchema(name:canister_server, type:string, comment:null), 
> FieldSchema(name:session_timestamp, type:string, comment:null), 
> FieldSchema(name:session_duration, type:string, comment:null), 
> FieldSchema(name:hit_count, type:bigint, comment:null), 
> FieldSchema(name:http_user_agent, type:string, comment:null), 
> FieldSchema(name:extractid, type:bigint, comment:null), 
> FieldSchema(name:site_link, type:string, comment:null), FieldSchema(name:dt, 
> type:string, comment:null), FieldSchema(name:hour, type:int, comment:null)], 
> location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name, 
> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=   , field.delim=
> *** SEE ABOVE: Compression is set to FALSE, even though contents of table is 
> compressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25239) Create the compression table but the properties Compressed is No

2021-06-10 Thread XixiHua (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361375#comment-17361375
 ] 

XixiHua commented on HIVE-25239:


hi, this is to be fixed, https://issues.apache.org/jira/browse/HIVE-2250

> Create the compression table but the properties Compressed is No
> 
>
> Key: HIVE-25239
> URL: https://issues.apache.org/jira/browse/HIVE-25239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: GuangMing Lu
>Priority: Major
>  Labels: easyfix
> Attachments: image-2021-06-11-10-49-25-710.png
>
>
> Create an ORC Snappy format table, call 'desc formatted table' found that 
> 'Compressed' is No, should need to display as YES
> {quote}create database lgm;
> create table lgm.test_tbl(
>  f1 int,
>  f2 string
> ) stored as orc
> TBLPROPERTIES("orc.compress"="snappy");
> desc formatted lgm.test_tbl;
> !image-2021-06-11-10-49-25-710.png!
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25239) Create the compression table but the properties Compressed is No

2021-06-10 Thread GuangMing Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GuangMing Lu updated HIVE-25239:

Summary: Create the compression table but the properties Compressed is No  
(was: Create the compression table but the compressed properties are no)

> Create the compression table but the properties Compressed is No
> 
>
> Key: HIVE-25239
> URL: https://issues.apache.org/jira/browse/HIVE-25239
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: GuangMing Lu
>Priority: Major
>  Labels: easyfix
> Attachments: image-2021-06-11-10-49-25-710.png
>
>
> Create an ORC Snappy format table, call 'desc formatted table' found that 
> 'Compressed' is No, should need to display as YES
> {quote}create database lgm;
> create table lgm.test_tbl(
>  f1 int,
>  f2 string
> ) stored as orc
> TBLPROPERTIES("orc.compress"="snappy");
> desc formatted lgm.test_tbl;
> !image-2021-06-11-10-49-25-710.png!
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-06-10 Thread Matt McCline (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361327#comment-17361327
 ] 

Matt McCline commented on HIVE-25140:
-

[~kgyrtkirk] I have thought about this. I think using something like aspect 
oriented approach would make the Distributed Tracing (DT) feature much weaker 
since to add Spans you have to learn a new tool and DT could be out of sight 
and probably get neglected. The pattern of code used to create Spans is well 
established. OpenTelemetry (OTL) and its predecessors OpenTracing and 
OpenCensus use this pattern. You see the pattern in the DT books and in 
numerous tutorials on the web. And a bunch of programming languages are 
supported by OTL.

Also I do not think it adds a lot of code. It does sprinkle changes across the 
code base yes. The very nature of manually instrumenting code like Hive to do 
tracing is to start at the top of execution (e.g. BeeLine's SQL Statement) and 
judicially look for large areas of execution that would provide us benefit from 
a Span. The decision process for adding a new Span becomes a design process. I 
think it will be good for people to encounter Span creation in the code and 
explore what it does. It is easy with IDE autocomplete to add a new Span and go 
see how it looks in the UI.

> Hive Distributed Tracing -- Part 1: Disabled
> 
>
> Key: HIVE-25140
> URL: https://issues.apache.org/jira/browse/HIVE-25140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25140.01.patch, HIVE-25140.02.patch, 
> HIVE-25140.03.patch
>
>
> Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to 
> Thrift and protobuf version conflicts. A logging only exporter is used.
> There are Spans for BeeLine and Hive. Server 2. The code was developed on 
> branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
> time due to major metastore code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25238) Make SSL cipher suites configurable for Hive Web UI and HS2

2021-06-10 Thread Yongzhi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-25238:

Description: 
When starting a jetty http server, one can explicitly exclude certain (unsecure)
SSL cipher suites. This can be especially important, when Hive
needs to be compliant with security regulations. Need add properties to support 
Hive WebUi and HiveServer2 to this
For Hive Binary Cli Server, we can set include certain SSL cipher suites. 

  was:
When starting a jetty http server, one can explicitly exclude certain (unsecure)
SSL cipher suites. This can be especially important, when Hive
needs to be compliant with security regulations. Need add properties to support 
Hive WebUi and HiveServer2 to this


> Make SSL cipher suites configurable for Hive Web UI and HS2
> ---
>
> Key: HIVE-25238
> URL: https://issues.apache.org/jira/browse/HIVE-25238
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Web UI
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>
> When starting a jetty http server, one can explicitly exclude certain 
> (unsecure)
> SSL cipher suites. This can be especially important, when Hive
> needs to be compliant with security regulations. Need add properties to 
> support Hive WebUi and HiveServer2 to this
> For Hive Binary Cli Server, we can set include certain SSL cipher suites. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25238) Make SSL cipher suites configurable for Hive Web UI and HS2

2021-06-10 Thread Yongzhi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-25238:

Summary: Make SSL cipher suites configurable for Hive Web UI and HS2  (was: 
Make excluded SSL cipher suites configurable for Hive Web UI and HS2)

> Make SSL cipher suites configurable for Hive Web UI and HS2
> ---
>
> Key: HIVE-25238
> URL: https://issues.apache.org/jira/browse/HIVE-25238
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Web UI
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>
> When starting a jetty http server, one can explicitly exclude certain 
> (unsecure)
> SSL cipher suites. This can be especially important, when Hive
> needs to be compliant with security regulations. Need add properties to 
> support Hive WebUi and HiveServer2 to this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25214) Add hive authorization support for Data connectors.

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25214:
--
Labels: pull-request-available  (was: )

> Add hive authorization support for Data connectors.
> ---
>
> Key: HIVE-25214
> URL: https://issues.apache.org/jira/browse/HIVE-25214
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Dantong Dong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to add authorization support for data connectors in hive. The default 
> behavior should be
> 1) Connectors can be create/dropped by users in admin role.
> 2) Connectors have READ and WRITE permissions.
> *   READ permissions are required to fetch a connector object or fetch all 
> connector names. So to create a REMOTE database using a connector, users will 
> need READ permission on the connector. DDL queries like "show connectors" and 
> "describe " will check for read access on the connector as well.
> *   WRITE permissions are required to alter/drop a connector. DDL queries 
> like "alter connector" and "drop connector" will need WRITE access on the 
> connector.
> Adding this support, Ranger can integrate with this.
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25214) Add hive authorization support for Data connectors.

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25214?focusedWorklogId=609985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609985
 ]

ASF GitHub Bot logged work on HIVE-25214:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 23:00
Start Date: 10/Jun/21 23:00
Worklog Time Spent: 10m 
  Work Description: dantongdong opened a new pull request #2384:
URL: https://github.com/apache/hive/pull/2384


   [HIVE-25214](https://issues.apache.org/jira/browse/HIVE-25214): Add hive 
authorization support for Data connectors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609985)
Remaining Estimate: 0h
Time Spent: 10m

> Add hive authorization support for Data connectors.
> ---
>
> Key: HIVE-25214
> URL: https://issues.apache.org/jira/browse/HIVE-25214
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Dantong Dong
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to add authorization support for data connectors in hive. The default 
> behavior should be
> 1) Connectors can be create/dropped by users in admin role.
> 2) Connectors have READ and WRITE permissions.
> *   READ permissions are required to fetch a connector object or fetch all 
> connector names. So to create a REMOTE database using a connector, users will 
> need READ permission on the connector. DDL queries like "show connectors" and 
> "describe " will check for read access on the connector as well.
> *   WRITE permissions are required to alter/drop a connector. DDL queries 
> like "alter connector" and "drop connector" will need WRITE access on the 
> connector.
> Adding this support, Ranger can integrate with this.
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25238) Make excluded SSL cipher suites configurable for Hive Web UI and HS2

2021-06-10 Thread Yongzhi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned HIVE-25238:
---

Assignee: Yongzhi Chen

> Make excluded SSL cipher suites configurable for Hive Web UI and HS2
> 
>
> Key: HIVE-25238
> URL: https://issues.apache.org/jira/browse/HIVE-25238
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Web UI
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>
> When starting a jetty http server, one can explicitly exclude certain 
> (unsecure)
> SSL cipher suites. This can be especially important, when Hive
> needs to be compliant with security regulations. Need add properties to 
> support Hive WebUi and HiveServer2 to this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=609923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609923
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 20:36
Start Date: 10/Jun/21 20:36
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r649514661



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -230,23 +240,26 @@ private static long getDirSize(AcidUtils.ParsedDirectory 
dir, FileSystem fs) thr
   .sum();
   }
 
-  private void configure(HiveConf conf){
-deltasThreshold = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD);
-obsoleteDeltasThreshold = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD);
-
-initMetricsCache(conf);
-long reportingInterval = HiveConf.getTimeVar(conf,
-HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, 
TimeUnit.SECONDS);
-
-ThreadFactory threadFactory =
-  new ThreadFactoryBuilder()
-.setDaemon(true)
-.setNameFormat("DeltaFilesMetricReporter %d")
-.build();
-executorService = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
-executorService.scheduleAtFixedRate(
-new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS);
-LOG.info("Started DeltaFilesMetricReporter thread");
+  private void configure(HiveConf conf) {
+acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON);
+if (acidMetricsExtEnabled) {

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609923)
Time Spent: 1h 40m  (was: 1.5h)

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25234?focusedWorklogId=609882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609882
 ]

ASF GitHub Bot logged work on HIVE-25234:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 18:54
Start Date: 10/Jun/21 18:54
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #2382:
URL: https://github.com/apache/hive/pull/2382#issuecomment-858911481


   @marton-bod @pvary @szlta Could you please review this change? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609882)
Time Spent: 20m  (was: 10m)

> Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on 
> Iceberg tables
> -
>
> Key: HIVE-25234
> URL: https://issues.apache.org/jira/browse/HIVE-25234
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Provide a way to change the schema and the Iceberg partitioning specification 
> using Hive syntax.
> {code:sql}
> ALTER TABLE tbl SET PARTITION SPEC(...)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25235:
--
Description: 
While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's best to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.

The current OOM logic in {{HiveServer2OomHookRunner}} causes HiveServer2 to 
shutdown, but we already have that with the JVM shutdown hook.  This JVM 
shutdown hook is triggered if {{-XX:OnOutOfMemoryError="kill -9 %p"}} exists 
and is the appropriate thing to do.

https://github.com/apache/hive/blob/328d197431b2ff1000fd9c56ce758013eff81ad8/service/src/java/org/apache/hive/service/server/HiveServer2.java#L443-L444
https://github.com/apache/hive/blob/cb0541a31b87016fae8e4c0e7130532c6e5f8de7/service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java#L42-L44

  was:
While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's best to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.


> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.
> The current OOM logic in {{HiveServer2OomHookRunner}} causes HiveServer2 to 
> shutdown, but we already have that with the JVM shutdown hook.  This JVM 
> shutdown hook is triggered if {{-XX:OnOutOfMemoryError="kill -9 %p"}} exists 
> and is the appropriate thing to do.
> https://github.com/apache/hive/blob/328d197431b2ff1000fd9c56ce758013eff81ad8/service/src/java/org/apache/hive/service/server/HiveServer2.java#L443-L444
> https://github.com/apache/hive/blob/cb0541a31b87016fae8e4c0e7130532c6e5f8de7/service/src/java/org/apache/hive/service/server/HiveServer2OomHookRunner.java#L42-L44



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609837
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 17:30
Start Date: 10/Jun/21 17:30
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649387084



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());
+client = metaStore.getClient();
+
+// Clean up the database
+client.dropDatabase(DB_NAME, true, true, true);
+metaStore.cleanWarehouseDirs();
+Database db = new DatabaseBuilder().
+setName(DB_NAME).
+create(client, metaStore.getConf());
+
+// Create test tables with 3 partitions
+createTable(TABLE_NAME, getYearPartCol());
+createPartitions();
+
+// Hack to initialize the stats tables. Not sure why the first test run 
using remote metastore is failing with
+// error (Unable to update Column stats for  table due to: Table/View 

[jira] [Work logged] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?focusedWorklogId=609829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609829
 ]

ASF GitHub Bot logged work on HIVE-25235:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 17:20
Start Date: 10/Jun/21 17:20
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2383:
URL: https://github.com/apache/hive/pull/2383


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609829)
Remaining Estimate: 0h
Time Spent: 10m

> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25235:
--
Labels: pull-request-available  (was: )

> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant to be more resilient

2021-06-10 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25237:

Summary: Thrift CLI Service Protocol: Enhance HTTP variant to be more 
resilient  (was: Thrift CLI Service Protocol: Enhance HTTP variant)

> Thrift CLI Service Protocol: Enhance HTTP variant to be more resilient
> --
>
> Key: HIVE-25237
> URL: https://issues.apache.org/jira/browse/HIVE-25237
> Project: Hive
>  Issue Type: Improvement
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
>
> I have been thinking about the (Thrift) CLI Service protocol between the 
> client and server.
> Cloudera's Prashanth Jayachandran (private e-mail) told me that its original 
> BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP 
> transport. HTTP is used when we go through a Gateway. The design for HTTP is 
> stateless and different in nature than the direct BINARY TCP/IP connection. 
> Which means today when we see that a Hive Server 2 response to a HTTP query 
> request can be lost and that is part of the design... It is the WARNING we 
> have seen when the Gateway drops its HTTP connection to Hive Server 2. We had 
> been thinking this was a bug but it is by design.
> I think the HTTP design needs a rethink.
> When I worked for Tandem computers a long time ago messages were 
> fault-tolerant. They used a message sequence #. When you send a message to a 
> Tandem server it is a process pair. The message gets routed to the current 
> process called the primary. The primary computes the message work and tells 
> the backup process to remember the results before replying in case there is a 
> failure. You can see where this goes -- if there is a failure before the 
> client gets the result it retries and the backup process can resiliently give 
> back the result the primary sent it. This isn't unique to Tandem -- without a 
> process-pair -- this is a general resilient protocol.
> In the HTTP design says message lost is possible both directions (request and 
> response). I think we adopt a better scheme but not necessarily a process 
> pair.
> The first principle of rethink is the +_client_+ needs to generate a new 
> operation num (an integer) that replaces the server-side generated random 
> GUID. And the client generates a new msg num within its new operation. So 
> beeline might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 
> 1. If the client gets an OS connection kind of error, it retries with those 
> (57, 1) numbers. Hive Server 2 will remember the last response. When Hive 
> Server 2 gets a message, there are 3 cases:
> 1) The sessionId GUID is not valid -- for now we reject the request because 
> it is likely Hive Server 2 killed the session perhaps because it was 
> restarted.
> 2) The operationNum or operationMsgNum is new. (Assert the msg num increases 
> monotonically.) Perform the request and save the response. And respond.
> 3) The (operationNum, operationMsgNum) matches the last request. Resiliently 
> respond with the saved result.
> I think this message handling is in alignment with the HTTP stateless and any 
> messages in-between can be lost philosophy. And it will shield the client 
> from suffering a whole category of message failures that unnecessarily kill 
> queries.
> This also allows to not worry about which request is idempotent or not but 
> instead requests are resilient.
> -
> Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for 
> idempotent and unsent http methods by prasanthj · Pull Request #1983 · 
> apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant

2021-06-10 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-25237:
---


> Thrift CLI Service Protocol: Enhance HTTP variant
> -
>
> Key: HIVE-25237
> URL: https://issues.apache.org/jira/browse/HIVE-25237
> Project: Hive
>  Issue Type: Improvement
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
>
> I have been thinking about the (Thrift) CLI Service protocol between the 
> client and server.
> Cloudera's Prashanth Jayachandran (private e-mail) told me that its original 
> BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP 
> transport. HTTP is used when we go through a Gateway. The design for HTTP is 
> stateless and different in nature than the direct BINARY TCP/IP connection. 
> Which means today when we see that a Hive Server 2 response to a HTTP query 
> request can be lost and that is part of the design... It is the WARNING we 
> have seen when the Gateway drops its HTTP connection to Hive Server 2. We had 
> been thinking this was a bug but it is by design.
> I think the HTTP design needs a rethink.
> When I worked for Tandem computers a long time ago messages were 
> fault-tolerant. They used a message sequence #. When you send a message to a 
> Tandem server it is a process pair. The message gets routed to the current 
> process called the primary. The primary computes the message work and tells 
> the backup process to remember the results before replying in case there is a 
> failure. You can see where this goes -- if there is a failure before the 
> client gets the result it retries and the backup process can resiliently give 
> back the result the primary sent it. This isn't unique to Tandem -- without a 
> process-pair -- this is a general resilient protocol.
> In the HTTP design says message lost is possible both directions (request and 
> response). I think we adopt a better scheme but not necessarily a process 
> pair.
> The first principle of rethink is the +_client_+ needs to generate a new 
> operation num (an integer) that replaces the server-side generated random 
> GUID. And the client generates a new msg num within its new operation. So 
> beeline might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 
> 1. If the client gets an OS connection kind of error, it retries with those 
> (57, 1) numbers. Hive Server 2 will remember the last response. When Hive 
> Server 2 gets a message, there are 3 cases:
> 1) The sessionId GUID is not valid -- for now we reject the request because 
> it is likely Hive Server 2 killed the session perhaps because it was 
> restarted.
> 2) The operationNum or operationMsgNum is new. (Assert the msg num increases 
> monotonically.) Perform the request and save the response. And respond.
> 3) The (operationNum, operationMsgNum) matches the last request. Resiliently 
> respond with the saved result.
> I think this message handling is in alignment with the HTTP stateless and any 
> messages in-between can be lost philosophy. And it will shield the client 
> from suffering a whole category of message failures that unnecessarily kill 
> queries.
> This also allows to not worry about which request is idempotent or not but 
> instead requests are resilient.
> -
> Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for 
> idempotent and unsent http methods by prasanthj · Pull Request #1983 · 
> apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609823
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 17:05
Start Date: 10/Jun/21 17:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649370626



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";

Review comment:
   We do not need this anymore




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609823)
Time Spent: 3h  (was: 2h 50m)

> Update column stat throws NPE if direct sql is disabled
> ---
>
> Key: HIVE-25225
> URL: https://issues.apache.org/jira/browse/HIVE-25225
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In case direct sql is disabled, the MetaStoreDirectSql object is not 
> initialised and thats causing NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609801
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 16:07
Start Date: 10/Jun/21 16:07
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649328040



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {
+Map msConf = new 
HashMap();
+// Enable trash, so it can be tested
+Map extraConf = new HashMap<>();
+extraConf.put("fs.trash.checkpoint.interval", "30");  // 
FS_TRASH_CHECKPOINT_INTERVAL_KEY
+extraConf.put("fs.trash.interval", "30"); // 
FS_TRASH_INTERVAL_KEY (hadoop-2)
+startMetaStores(msConf, extraConf);
+  }
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());
+client = metaStore.getClient();
+
+// Clean up the database
+client.dropDatabase(DB_NAME, true, true, true);
+metaStore.cleanWarehouseDirs();
+Database db = 

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609798
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 16:06
Start Date: 10/Jun/21 16:06
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649327492



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {
+Map msConf = new 
HashMap();
+// Enable trash, so it can be tested
+Map extraConf = new HashMap<>();
+extraConf.put("fs.trash.checkpoint.interval", "30");  // 
FS_TRASH_CHECKPOINT_INTERVAL_KEY
+extraConf.put("fs.trash.interval", "30"); // 
FS_TRASH_INTERVAL_KEY (hadoop-2)
+startMetaStores(msConf, extraConf);
+  }
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());

Review comment:
   // Hack to initialize the stats tables. Not sure why the first test 
run using remote metastore is failing with
   // error (Unable to update 

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609797
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 16:06
Start Date: 10/Jun/21 16:06
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649326863



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {
+Map msConf = new 
HashMap();
+// Enable trash, so it can be tested
+Map extraConf = new HashMap<>();
+extraConf.put("fs.trash.checkpoint.interval", "30");  // 
FS_TRASH_CHECKPOINT_INTERVAL_KEY
+extraConf.put("fs.trash.interval", "30"); // 
FS_TRASH_INTERVAL_KEY (hadoop-2)
+startMetaStores(msConf, extraConf);
+  }
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());
+client = metaStore.getClient();
+
+// Clean up the database
+client.dropDatabase(DB_NAME, true, true, true);
+metaStore.cleanWarehouseDirs();
+Database db = 

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609799
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 16:06
Start Date: 10/Jun/21 16:06
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649327779



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {

Review comment:
   removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609799)
Time Spent: 2h 40m  (was: 2.5h)

> Update column stat throws NPE if direct sql is disabled
> ---
>
> Key: HIVE-25225
> URL: https://issues.apache.org/jira/browse/HIVE-25225
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining 

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609790
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 16:02
Start Date: 10/Jun/21 16:02
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649323335



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {
+Map msConf = new 
HashMap();
+// Enable trash, so it can be tested
+Map extraConf = new HashMap<>();
+extraConf.put("fs.trash.checkpoint.interval", "30");  // 
FS_TRASH_CHECKPOINT_INTERVAL_KEY
+extraConf.put("fs.trash.interval", "30"); // 
FS_TRASH_INTERVAL_KEY (hadoop-2)
+startMetaStores(msConf, extraConf);
+  }
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());
+client = metaStore.getClient();
+
+// Clean up the database
+client.dropDatabase(DB_NAME, true, true, true);
+metaStore.cleanWarehouseDirs();
+Database db = 

[jira] [Updated] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25235:
--
Description: 
While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's best to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.

  was:
While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's be to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.


> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's best to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25235:
-


> Remove ThreadPoolExecutorWithOomHook
> 
>
> Key: HIVE-25235
> URL: https://issues.apache.org/jira/browse/HIVE-25235
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> While I was looking at [HIVE-24846] to better perform OOM logging and I just 
> realized that this is not a good way to handle OOM.
> https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java
> bq. there's likely no easy way for you to recover from it if you do catch it
> If we want to handle OOM, it's best to do it from outside. It's be to do it 
> with the JVM facilities:
> {{-XX:+ExitOnOutOfMemoryError}}
> {{-XX:OnOutOfMemoryError}}
> It seems odd that the OOM handler attempts to load a handler and then do more 
> work when clearly the server is hosed at this point and just requesting to do 
> more work will further add to memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=609753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609753
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 15:05
Start Date: 10/Jun/21 15:05
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r649272168



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -454,6 +454,8 @@ public static ConfVars getMetaConf(String name) {
 "hive.metastore.acidmetrics.check.interval", 300,
 TimeUnit.SECONDS,
 "Time in seconds between acid related metric collection runs."),
+METASTORE_ACIDMETRICS_EXT_ON("metastore.acidmetrics.ext.on", 
"hive.metastore.acidmetrics.ext.on", true,
+"Whether to collect additional acid related metrics outside of the 
acid metrics service."),

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609753)
Time Spent: 1.5h  (was: 1h 20m)

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21997) [HiveMS] Hive Metastore as Mysql backend DB

2021-06-10 Thread XixiHua (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360992#comment-17360992
 ] 

XixiHua commented on HIVE-21997:


hi, the log reminds not found hive-site.xml file, so did you add hive-site.xml 
file under conf folder?

> [HiveMS] Hive Metastore as Mysql backend DB
> ---
>
> Key: HIVE-21997
> URL: https://issues.apache.org/jira/browse/HIVE-21997
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Anand
>Priority: Blocker
> Attachments: metastore.log
>
>
> I Installed hive-standalone-metastore-3.0.0 using mysql as backend DB its 
> successfully initiate schema and server is running up.
>  
> *Note :* This installation not include hive and Hadoop installation its only 
> hive metastore having local directory having backend database Mysql . I 
> verified all tables are created in backend DB which are initiated while 
> crating schema.
>  
> But when I run schematool -dbType mysql -passWord root -userName root 
> -validate to validate it this command kill the running server metastore 
> process.
>  
> Logs for the same are attached with this mail (there is no log written while 
> server failed that why unable to know the reason behind it).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25224) Multi insert statements involving tables with different bucketing_versions results in error

2021-06-10 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360986#comment-17360986
 ] 

Zoltan Haindrich commented on HIVE-25224:
-

this check could be suppressed even for tables with >1 buckets - the FSO 
operators are computing the bucket id from the row

> Multi insert statements involving tables with different bucketing_versions 
> results in error
> ---
>
> Key: HIVE-25224
> URL: https://issues.apache.org/jira/browse/HIVE-25224
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> drop table if exists t;
> drop table if exists t2;
> drop table if exists t3;
> create table t (a integer);
> create table t2 (a integer);
> create table t3 (a integer);
> alter table t set tblproperties ('bucketing_version'='1');
> explain from t3 insert into t select a insert into t2 select a;
> {code}
> results in
> {code}
> Error: Error while compiling statement: FAILED: RuntimeException Error 
> setting bucketingVersion for group: [[op: FS[2], bucketingVersion=1], [op: 
> FS[11], bucketingVersion=2]] (state=42000,code=4)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25234?focusedWorklogId=609733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609733
 ]

ASF GitHub Bot logged work on HIVE-25234:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 14:38
Start Date: 10/Jun/21 14:38
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2382:
URL: https://github.com/apache/hive/pull/2382


   
   
   
   ### What changes were proposed in this pull request?
   
   Create new syntax to support partition spec change on Iceberg tables.
   `ALTER TABLE tbl SET PARTITION SPEC(years(ts), id)`
   
   During the alter operation, the new partition spec will overwrite the 
existing partition spec. 
   
   
   
   
   ### Why are the changes needed?
   HIVE-25179 introduced support for partition transforms from DDL. We need a 
way to alter existing partition transforms
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   New syntax.
   
   
   
   ### How was this patch tested?
   Manual test, unit test
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609733)
Remaining Estimate: 0h
Time Spent: 10m

> Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on 
> Iceberg tables
> -
>
> Key: HIVE-25234
> URL: https://issues.apache.org/jira/browse/HIVE-25234
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Provide a way to change the schema and the Iceberg partitioning specification 
> using Hive syntax.
> {code:sql}
> ALTER TABLE tbl SET PARTITION SPEC(...)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25234:
--
Labels: pull-request-available  (was: )

> Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on 
> Iceberg tables
> -
>
> Key: HIVE-25234
> URL: https://issues.apache.org/jira/browse/HIVE-25234
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Provide a way to change the schema and the Iceberg partitioning specification 
> using Hive syntax.
> {code:sql}
> ALTER TABLE tbl SET PARTITION SPEC(...)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables

2021-06-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25234:



> Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on 
> Iceberg tables
> -
>
> Key: HIVE-25234
> URL: https://issues.apache.org/jira/browse/HIVE-25234
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> Provide a way to change the schema and the Iceberg partitioning specification 
> using Hive syntax.
> {code:sql}
> ALTER TABLE tbl SET PARTITION SPEC(...)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25223) Select with limit returns no rows on non native table

2021-06-10 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-25223.
---
Resolution: Fixed

Pushed to master. Thanks [~amagyar].

> Select with limit returns no rows on non native table
> -
>
> Key: HIVE-25223
> URL: https://issues.apache.org/jira/browse/HIVE-25223
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Str:
> {code:java}
> CREATE EXTERNAL TABLE hht (key string, value int) 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
> TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" 
> = "hht");
> insert into hht select uuid(), cast((rand() * 100) as int);
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
>  set hive.fetch.task.conversion=none;
>  select * from hht limit 10;
> +--++
> | hht.key  | hht.value  |
> +--++
> +--++
> No rows selected (5.22 seconds) {code}
>  
> This is caused by GlobalLimitOptimizer. The table directory is always empty 
> with a non native table since the data is not managed by hive (but hbase in 
> this case).
> The optimizer scans the directory and sets the file list to an empty list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25223) Select with limit returns no rows on non native table

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25223?focusedWorklogId=609716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609716
 ]

ASF GitHub Bot logged work on HIVE-25223:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 14:00
Start Date: 10/Jun/21 14:00
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2375:
URL: https://github.com/apache/hive/pull/2375


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609716)
Time Spent: 0.5h  (was: 20m)

> Select with limit returns no rows on non native table
> -
>
> Key: HIVE-25223
> URL: https://issues.apache.org/jira/browse/HIVE-25223
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Str:
> {code:java}
> CREATE EXTERNAL TABLE hht (key string, value int) 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
> TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" 
> = "hht");
> insert into hht select uuid(), cast((rand() * 100) as int);
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
>  set hive.fetch.task.conversion=none;
>  select * from hht limit 10;
> +--++
> | hht.key  | hht.value  |
> +--++
> +--++
> No rows selected (5.22 seconds) {code}
>  
> This is caused by GlobalLimitOptimizer. The table directory is always empty 
> with a non native table since the data is not managed by hive (but hbase in 
> this case).
> The optimizer scans the directory and sets the file list to an empty list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609663
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 12:03
Start Date: 10/Jun/21 12:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649117229



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,36 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean commitInMoveTask() {
+return true;
+  }
+
+  @Override
+  public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
+if (jobContext.isPresent()) {
+  OutputCommitter committer = new HiveIcebergOutputCommitter();
+  try {
+// Committing the job
+committer.commitJob(jobContext.get());
+  } catch (Throwable e) {
+// Aborting the job if the commit has failed
+LOG.error("Error while trying to commit job: {}, starting rollback 
changes for table: {}",
+jobContext.get().getJobID(), tableName, e);
+try {
+  committer.abortJob(jobContext.get(), JobStatus.State.FAILED);
+} catch (IOException ioe) {
+  LOG.error("Error while trying to abort failed job. There might be 
uncleaned data files.", ioe);
+  // no throwing here because the original exception should be 
propagated
+}
+throw new HiveException("Error committing job: " + 
jobContext.get().getJobID() + " for table: " + tableName);

Review comment:
   ehh... always when I try to do things fast, I make stupid mistakes
   Thanks for catching

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,36 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean commitInMoveTask() {
+return true;
+  }
+
+  @Override
+  public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
+if (jobContext.isPresent()) {
+  OutputCommitter committer = new HiveIcebergOutputCommitter();
+  try {
+// Committing the job

Review comment:
   Removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609663)
Time Spent: 3h 40m  (was: 3.5h)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25224) Multi insert statements involving tables with different bucketing_versions results in error

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25224?focusedWorklogId=609659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609659
 ]

ASF GitHub Bot logged work on HIVE-25224:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 11:47
Start Date: 10/Jun/21 11:47
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #2381:
URL: https://github.com/apache/hive/pull/2381


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609659)
Remaining Estimate: 0h
Time Spent: 10m

> Multi insert statements involving tables with different bucketing_versions 
> results in error
> ---
>
> Key: HIVE-25224
> URL: https://issues.apache.org/jira/browse/HIVE-25224
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> drop table if exists t;
> drop table if exists t2;
> drop table if exists t3;
> create table t (a integer);
> create table t2 (a integer);
> create table t3 (a integer);
> alter table t set tblproperties ('bucketing_version'='1');
> explain from t3 insert into t select a insert into t2 select a;
> {code}
> results in
> {code}
> Error: Error while compiling statement: FAILED: RuntimeException Error 
> setting bucketingVersion for group: [[op: FS[2], bucketingVersion=1], [op: 
> FS[11], bucketingVersion=2]] (state=42000,code=4)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25224) Multi insert statements involving tables with different bucketing_versions results in error

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25224:
--
Labels: pull-request-available  (was: )

> Multi insert statements involving tables with different bucketing_versions 
> results in error
> ---
>
> Key: HIVE-25224
> URL: https://issues.apache.org/jira/browse/HIVE-25224
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> drop table if exists t;
> drop table if exists t2;
> drop table if exists t3;
> create table t (a integer);
> create table t2 (a integer);
> create table t3 (a integer);
> alter table t set tblproperties ('bucketing_version'='1');
> explain from t3 insert into t select a insert into t2 select a;
> {code}
> results in
> {code}
> Error: Error while compiling statement: FAILED: RuntimeException Error 
> setting bucketingVersion for group: [[op: FS[2], bucketingVersion=1], [op: 
> FS[11], bucketingVersion=2]] (state=42000,code=4)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609656
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 11:26
Start Date: 10/Jun/21 11:26
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649092943



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,36 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean commitInMoveTask() {
+return true;
+  }
+
+  @Override
+  public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
+if (jobContext.isPresent()) {
+  OutputCommitter committer = new HiveIcebergOutputCommitter();
+  try {
+// Committing the job

Review comment:
   nit: unnecessary comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609656)
Time Spent: 3.5h  (was: 3h 20m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609652
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 11:23
Start Date: 10/Jun/21 11:23
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649091082



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,36 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean commitInMoveTask() {
+return true;
+  }
+
+  @Override
+  public void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = generateJobContext(configuration, 
tableName, overwrite);
+if (jobContext.isPresent()) {
+  OutputCommitter committer = new HiveIcebergOutputCommitter();
+  try {
+// Committing the job
+committer.commitJob(jobContext.get());
+  } catch (Throwable e) {
+// Aborting the job if the commit has failed
+LOG.error("Error while trying to commit job: {}, starting rollback 
changes for table: {}",
+jobContext.get().getJobID(), tableName, e);
+try {
+  committer.abortJob(jobContext.get(), JobStatus.State.FAILED);
+} catch (IOException ioe) {
+  LOG.error("Error while trying to abort failed job. There might be 
uncleaned data files.", ioe);
+  // no throwing here because the original exception should be 
propagated
+}
+throw new HiveException("Error committing job: " + 
jobContext.get().getJobID() + " for table: " + tableName);

Review comment:
   Can we include `e` in the exception to get the underlying cause?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609652)
Time Spent: 3h 20m  (was: 3h 10m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609647
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 11:14
Start Date: 10/Jun/21 11:14
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649085738



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {
+Map msConf = new 
HashMap();
+// Enable trash, so it can be tested
+Map extraConf = new HashMap<>();
+extraConf.put("fs.trash.checkpoint.interval", "30");  // 
FS_TRASH_CHECKPOINT_INTERVAL_KEY
+extraConf.put("fs.trash.interval", "30"); // 
FS_TRASH_INTERVAL_KEY (hadoop-2)
+startMetaStores(msConf, extraConf);
+  }
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());
+client = metaStore.getClient();
+
+// Clean up the database
+client.dropDatabase(DB_NAME, true, true, true);
+metaStore.cleanWarehouseDirs();
+Database db = new 

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609645
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 11:12
Start Date: 10/Jun/21 11:12
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649084530



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {
+Map msConf = new 
HashMap();
+// Enable trash, so it can be tested
+Map extraConf = new HashMap<>();
+extraConf.put("fs.trash.checkpoint.interval", "30");  // 
FS_TRASH_CHECKPOINT_INTERVAL_KEY
+extraConf.put("fs.trash.interval", "30"); // 
FS_TRASH_INTERVAL_KEY (hadoop-2)
+startMetaStores(msConf, extraConf);
+  }
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());
+client = metaStore.getClient();
+
+// Clean up the database
+client.dropDatabase(DB_NAME, true, true, true);
+metaStore.cleanWarehouseDirs();
+Database db = new 

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609644
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 11:11
Start Date: 10/Jun/21 11:11
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649083980



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {
+Map msConf = new 
HashMap();
+// Enable trash, so it can be tested
+Map extraConf = new HashMap<>();
+extraConf.put("fs.trash.checkpoint.interval", "30");  // 
FS_TRASH_CHECKPOINT_INTERVAL_KEY
+extraConf.put("fs.trash.interval", "30"); // 
FS_TRASH_INTERVAL_KEY (hadoop-2)
+startMetaStores(msConf, extraConf);
+  }
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());

Review comment:
   I am not sure how `Parameterized` test will behave if we override the 
`MetaStoreClientTest.getMetaStoreToTest` method, but we might spend a few 
minutes to try 

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609643
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 11:05
Start Date: 10/Jun/21 11:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649080491



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {

Review comment:
   Do we need this method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609643)
Time Spent: 1.5h  (was: 1h 20m)

> Update column stat throws NPE if direct sql is disabled
> ---
>
> Key: HIVE-25225
> URL: https://issues.apache.org/jira/browse/HIVE-25225
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609640
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 11:00
Start Date: 10/Jun/21 11:00
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649077478



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestPartitionStat.java
##
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.ColumnStatistics;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsData;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsDesc;
+import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
+import org.apache.hadoop.hive.metastore.api.Database;
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.SetPartitionsStatsRequest;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.apache.hadoop.hive.metastore.utils.FileUtils;
+
+import org.apache.thrift.TException;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import com.google.common.collect.Lists;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Tests for updating partition column stats.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestPartitionStat extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private String directSql = "false";
+
+  protected static final String DB_NAME = "test_part_stat";
+  protected static final String TABLE_NAME = "test_part_stat_table";
+  private static final String DEFAULT_COL_TYPE = "int";
+  private static final String PART_COL_NAME = "year";
+  protected static final short MAX = -1;
+  private static final Partition[] PARTITIONS = new Partition[5];
+  public static final String HIVE_ENGINE = "hive";
+
+  @BeforeClass
+  public static void startMetaStores() {
+Map msConf = new 
HashMap();
+// Enable trash, so it can be tested
+Map extraConf = new HashMap<>();
+extraConf.put("fs.trash.checkpoint.interval", "30");  // 
FS_TRASH_CHECKPOINT_INTERVAL_KEY
+extraConf.put("fs.trash.interval", "30"); // 
FS_TRASH_INTERVAL_KEY (hadoop-2)
+startMetaStores(msConf, extraConf);
+  }
+
+  public TestPartitionStat(String name, AbstractMetaStoreService metaStore) {
+this.metaStore = metaStore;
+  }
+
+  @Before
+  public void setUp() throws Exception {
+// Get new client, store the original value of directSql to restore it 
back after test.
+directSql = 
metaStore.getConf().get(MetastoreConf.ConfVars.TRY_DIRECT_SQL.getVarname());
+client = metaStore.getClient();
+
+// Clean up the database
+client.dropDatabase(DB_NAME, true, true, true);
+metaStore.cleanWarehouseDirs();
+Database db = new 

[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609639
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 10:53
Start Date: 10/Jun/21 10:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r649073019



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -9037,8 +9061,16 @@ public boolean 
set_aggr_stats_for(SetPartitionsStatsRequest request) throws TExc
   } else { // No merge.
 Table t = getTable(catName, dbName, tableName);
 // We don't short-circuit on errors here anymore. That can leave acid 
stats invalid.
-ret = updatePartitionColStatsInBatch(t, newStatsMap,
-request.getValidWriteIdList(), request.getWriteId());
+if (MetastoreConf.getBoolVar(getConf(), ConfVars.TRY_DIRECT_SQL)) {

Review comment:
   As discussed offline it would be good to move the batching logic back to 
the ObjectStore, but that is a different patch altogether.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609639)
Time Spent: 1h 10m  (was: 1h)

> Update column stat throws NPE if direct sql is disabled
> ---
>
> Key: HIVE-25225
> URL: https://issues.apache.org/jira/browse/HIVE-25225
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In case direct sql is disabled, the MetaStoreDirectSql object is not 
> initialised and thats causing NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25226) Hive changes 'storage_handler' for existing Iceberg table when hive.engine.enabled is false

2021-06-10 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25226.
---
Resolution: Won't Fix

> Hive changes 'storage_handler' for existing Iceberg table when 
> hive.engine.enabled is false
> ---
>
> Key: HIVE-25226
> URL: https://issues.apache.org/jira/browse/HIVE-25226
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Marton Bod
>Priority: Major
>  Labels: iceberg
>
> If Hive writes to an existing Iceberg table but property 
> 'hive.engine.enabled' is not set, then Hive rewrites the table metadata with 
> different SerDe/Input/Output format than it had before.
> E.g. there's an existing table with the following metadata:
> {noformat}
>   storage_handler  | 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
> | SerDe Library: | org.apache.iceberg.mr.hive.HiveIcebergSerDe | NULL |
> | InputFormat:   | org.apache.iceberg.mr.hive.HiveIcebergInputFormat | NULL |
> | OutputFormat:  | org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | NULL |
> {noformat}
> Now when Hive inserts to this table it clears 'storage_handler' and rewrites 
> the rest:
> {noformat}
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>   |
> | InputFormat:  | org.apache.hadoop.mapred.FileInputFormat
>| NULL   |
> | OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat   
>| NULL   |
> {noformat}
> This means the table becomes unreadable:
> {noformat}
> Error: java.io.IOException: java.io.IOException: Cannot create an instance of 
> InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in 
> mapredWork! (state=,code=0)
> {noformat}
> I think Hive should always set 'hive.engine.enabled' for Iceberg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25226) Hive changes 'storage_handler' for existing Iceberg table when hive.engine.enabled is false

2021-06-10 Thread Marton Bod (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360701#comment-17360701
 ] 

Marton Bod commented on HIVE-25226:
---

As discussed with [~boroknagyz] this is currently the expected behaviour if 
engine.hive.enabled is not set to true in the table properties. This will be 
handled by: https://issues.apache.org/jira/browse/IMPALA-10741

> Hive changes 'storage_handler' for existing Iceberg table when 
> hive.engine.enabled is false
> ---
>
> Key: HIVE-25226
> URL: https://issues.apache.org/jira/browse/HIVE-25226
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Marton Bod
>Priority: Major
>  Labels: iceberg
>
> If Hive writes to an existing Iceberg table but property 
> 'hive.engine.enabled' is not set, then Hive rewrites the table metadata with 
> different SerDe/Input/Output format than it had before.
> E.g. there's an existing table with the following metadata:
> {noformat}
>   storage_handler  | 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
> | SerDe Library: | org.apache.iceberg.mr.hive.HiveIcebergSerDe | NULL |
> | InputFormat:   | org.apache.iceberg.mr.hive.HiveIcebergInputFormat | NULL |
> | OutputFormat:  | org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | NULL |
> {noformat}
> Now when Hive inserts to this table it clears 'storage_handler' and rewrites 
> the rest:
> {noformat}
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>   |
> | InputFormat:  | org.apache.hadoop.mapred.FileInputFormat
>| NULL   |
> | OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat   
>| NULL   |
> {noformat}
> This means the table becomes unreadable:
> {noformat}
> Error: java.io.IOException: java.io.IOException: Cannot create an instance of 
> InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in 
> mapredWork! (state=,code=0)
> {noformat}
> I think Hive should always set 'hive.engine.enabled' for Iceberg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609598
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 09:46
Start Date: 10/Jun/21 09:46
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649028108



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -317,6 +320,36 @@ public int execute() {
 }
 
 try (LocalTableLock lock = 
acquireLockForFileMove(work.getLoadTableWork())) {
+  String storageHandlerClass = null;

Review comment:
   Refactored out code




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609598)
Time Spent: 3h 10m  (was: 3h)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25233:
--
Labels: pull-request-available  (was: )

> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_unix_timestamp()
> unix_timestamp(string date, string pattern) => to_unix_timestamp()
> We should clean up unix_timestamp() and points to to_unix_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?focusedWorklogId=609595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609595
 ]

ASF GitHub Bot logged work on HIVE-25233:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 09:44
Start Date: 10/Jun/21 09:44
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma opened a new pull request #2380:
URL: https://github.com/apache/hive/pull/2380


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609595)
Remaining Estimate: 0h
Time Spent: 10m

> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_unix_timestamp()
> unix_timestamp(string date, string pattern) => to_unix_timestamp()
> We should clean up unix_timestamp() and points to to_unix_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609591
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 09:36
Start Date: 10/Jun/21 09:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649021325



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -263,4 +264,23 @@ default boolean supportsPartitionTransform() {
   default String getFileFormatPropertyKey() {
 return null;
   }
+
+  /**
+   * Check if we should use the {@link #nativeCommit(Properties, boolean)} 
method for committing inserts instead of
+   * using file copy in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}-s.
+   * @return
+   */
+  default boolean useNativeCommit() {
+return false;
+  }
+
+  /**
+   * Commits the inserts for the non-native tables. Used in the {@link 
org.apache.hadoop.hive.ql.exec.MoveTask}.

Review comment:
   settled on `commitInMoveTask` and `storageHandlerCommit`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609591)
Time Spent: 3h  (was: 2h 50m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609585
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 09:34
Start Date: 10/Jun/21 09:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649019475



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -317,6 +320,36 @@ public int execute() {
 }
 
 try (LocalTableLock lock = 
acquireLockForFileMove(work.getLoadTableWork())) {
+  String storageHandlerClass = null;
+  Properties commitProperties = null;
+  boolean overwrite = false;
+
+  if (work.getLoadTableWork() != null) {
+// Get the info from the table data
+TableDesc tableDesc = work.getLoadTableWork().getTable();
+storageHandlerClass = tableDesc.getProperties().getProperty(

Review comment:
   Did not change as discussed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609585)
Time Spent: 2h 50m  (was: 2h 40m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609584
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 09:34
Start Date: 10/Jun/21 09:34
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649019156



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) 
throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = 
getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+if (jobContext.isPresent()) {
+  try {
+OutputCommitter committer = new HiveIcebergOutputCommitter();
+committer.commitJob(jobContext.get());
+  } catch (Throwable e) {
+LOG.error("Error while trying to commit job, starting rollback", e);
+rollbackInsertTable(configuration, tableName, overwrite);
+throw new HiveException("Error committing job", e);

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609584)
Time Spent: 2h 40m  (was: 2.5h)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25222) Fix reading Iceberg tables with a comma in column names

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25222?focusedWorklogId=609581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609581
 ]

ASF GitHub Bot logged work on HIVE-25222:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 09:29
Start Date: 10/Jun/21 09:29
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2368:
URL: https://github.com/apache/hive/pull/2368#discussion_r649015411



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##
@@ -201,8 +203,11 @@ public static Schema readSchema(Configuration conf) {
   }
 
   public static String[] selectedColumns(Configuration conf) {
-String[] readColumns = conf.getStrings(InputFormatConfig.SELECTED_COLUMNS);
-return readColumns != null && readColumns.length > 0 ? readColumns : null;
+String readColumns = conf.get(InputFormatConfig.SELECTED_COLUMNS);
+if (readColumns == null || readColumns.isEmpty()) {
+  return null;
+}

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609581)
Time Spent: 20m  (was: 10m)

> Fix reading Iceberg tables with a comma in column names
> ---
>
> Key: HIVE-25222
> URL: https://issues.apache.org/jira/browse/HIVE-25222
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using a table with a column name containing a comma (e.g. `employ,ee`), 
> reading an Iceberg table fails because we rely on the property 
> "hive.io.file.readcolumn.names" which encodes the read columns in a 
> comma-separated list, put together by the ColumnProjectionUtils class.
> Because it's comma-separated in all cases, it will produce a string like: 
> "id,birth_date,employ,ee" which can cause problems for Iceberg readers which 
> use this string list to construct their expected read schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25232) Update Hive syntax to use plural form of time based partition transforms

2021-06-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-25232.
--
Resolution: Invalid

> Update Hive syntax to use plural form of time based partition transforms 
> -
>
> Key: HIVE-25232
> URL: https://issues.apache.org/jira/browse/HIVE-25232
> Project: Hive
>  Issue Type: Task
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> We should follow the [SparkSQL 
> syntax|https://iceberg.apache.org/spark-ddl/#partitioned-by] when defining 
> partition transform for Iceberg tables. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-06-10 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25233:
-
Description: 
Description

Since unix_timestamp() UDF was deprecated as part of 
https://issues.apache.org/jira/browse/HIVE-10728. Internal 
GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
date, string pattern).


unix_timestamp()   => CURRENT_TIMESTAMP
unix_timestamp(string date) => to_unix_timestamp()
unix_timestamp(string date, string pattern) => to_unix_timestamp()


We should clean up unix_timestamp() and points to to_utc_timestamp()
   


  was:
Description

Since unix_timestamp() UDF was deprecated as part of 
https://issues.apache.org/jira/browse/HIVE-10728. Internal 
GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
date, string pattern).


unix_timestamp()   => CURRENT_TIMESTAMP
unix_timestamp(string date) => to_utc_timestamp()
unix_timestamp(string date, string pattern) => to_utc_timestamp()


We should clean up unix_timestamp() and points to to_utc_timestamp()
   



> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_unix_timestamp()
> unix_timestamp(string date, string pattern) => to_unix_timestamp()
> We should clean up unix_timestamp() and points to to_utc_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-06-10 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25233:
-
Summary: Removing deprecated unix_timestamp UDF  (was: Removing deprecated 
unix_timestamp() UDF)

> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_utc_timestamp()
> unix_timestamp(string date, string pattern) => to_utc_timestamp()
> We should clean up unix_timestamp() and points to to_utc_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-06-10 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25233:
-
Description: 
Description

Since unix_timestamp() UDF was deprecated as part of 
https://issues.apache.org/jira/browse/HIVE-10728. Internal 
GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
date, string pattern).


unix_timestamp()   => CURRENT_TIMESTAMP
unix_timestamp(string date) => to_unix_timestamp()
unix_timestamp(string date, string pattern) => to_unix_timestamp()


We should clean up unix_timestamp() and points to to_unix_timestamp()
   


  was:
Description

Since unix_timestamp() UDF was deprecated as part of 
https://issues.apache.org/jira/browse/HIVE-10728. Internal 
GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
date, string pattern).


unix_timestamp()   => CURRENT_TIMESTAMP
unix_timestamp(string date) => to_unix_timestamp()
unix_timestamp(string date, string pattern) => to_unix_timestamp()


We should clean up unix_timestamp() and points to to_utc_timestamp()
   



> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_unix_timestamp()
> unix_timestamp(string date, string pattern) => to_unix_timestamp()
> We should clean up unix_timestamp() and points to to_unix_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25233) Removing deprecated unix_timestamp() UDF

2021-06-10 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25233 started by Ashish Sharma.

> Removing deprecated unix_timestamp() UDF
> 
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_utc_timestamp()
> unix_timestamp(string date, string pattern) => to_utc_timestamp()
> We should clean up unix_timestamp() and points to to_utc_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25233) Removing deprecated unix_timestamp() UDF

2021-06-10 Thread Ashish Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-25233:



> Removing deprecated unix_timestamp() UDF
> 
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_utc_timestamp()
> unix_timestamp(string date, string pattern) => to_utc_timestamp()
> We should clean up unix_timestamp() and points to to_utc_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609556
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 08:14
Start Date: 10/Jun/21 08:14
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648955441



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -263,4 +264,23 @@ default boolean supportsPartitionTransform() {
   default String getFileFormatPropertyKey() {
 return null;
   }
+
+  /**
+   * Check if we should use the {@link #nativeCommit(Properties, boolean)} 
method for committing inserts instead of
+   * using file copy in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}-s.
+   * @return
+   */
+  default boolean useNativeCommit() {
+return false;
+  }
+
+  /**
+   * Commits the inserts for the non-native tables. Used in the {@link 
org.apache.hadoop.hive.ql.exec.MoveTask}.

Review comment:
   Sounds good.
   Maybe for the first one, `commitWithMoveTask() default true;`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609556)
Time Spent: 2.5h  (was: 2h 20m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609553
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 08:12
Start Date: 10/Jun/21 08:12
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648953881



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -7316,7 +7316,7 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   destTableId++;
   // Create the work for moving the table
   // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
-  if (!isNonNativeTable) {
+  if (!isNonNativeTable || 
destinationTable.getStorageHandler().useNativeCommit()) {

Review comment:
   Oh I see. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609553)
Time Spent: 2h 10m  (was: 2h)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609554
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 08:12
Start Date: 10/Jun/21 08:12
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648954028



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -317,6 +320,36 @@ public int execute() {
 }
 
 try (LocalTableLock lock = 
acquireLockForFileMove(work.getLoadTableWork())) {
+  String storageHandlerClass = null;
+  Properties commitProperties = null;
+  boolean overwrite = false;
+
+  if (work.getLoadTableWork() != null) {
+// Get the info from the table data
+TableDesc tableDesc = work.getLoadTableWork().getTable();
+storageHandlerClass = tableDesc.getProperties().getProperty(

Review comment:
   Makes sense




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609554)
Time Spent: 2h 20m  (was: 2h 10m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609550
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 08:04
Start Date: 10/Jun/21 08:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648947866



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -7316,7 +7316,7 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   destTableId++;
   // Create the work for moving the table
   // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
-  if (!isNonNativeTable) {
+  if (!isNonNativeTable || 
destinationTable.getStorageHandler().useNativeCommit()) {

Review comment:
   Otherwise we will end up with `PreInsertTableOperation` instead of 
`MoveTask`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609550)
Time Spent: 2h  (was: 1h 50m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609547
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 08:01
Start Date: 10/Jun/21 08:01
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648945777



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -263,4 +264,23 @@ default boolean supportsPartitionTransform() {
   default String getFileFormatPropertyKey() {
 return null;
   }
+
+  /**
+   * Check if we should use the {@link #nativeCommit(Properties, boolean)} 
method for committing inserts instead of
+   * using file copy in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}-s.
+   * @return
+   */
+  default boolean useNativeCommit() {
+return false;
+  }
+
+  /**
+   * Commits the inserts for the non-native tables. Used in the {@link 
org.apache.hadoop.hive.ql.exec.MoveTask}.

Review comment:
   What do you think about these names?
   ```
 /**
  * Checks if we should keep the {@link 
org.apache.hadoop.hive.ql.exec.MoveTask} and use the
  * {@link #storageHandlerCommit(Properties, boolean)} method for 
committing inserts instead of
  * {@link 
org.apache.hadoop.hive.metastore.DefaultHiveMetaHook#commitInsertTable(Table, 
boolean)}.
  * @return Returns true if we should use the {@link 
#storageHandlerCommit(Properties, boolean)} method
  */
 default boolean keepMoveTask() {
   return false;
 }
   
 /**
  * Commits the inserts for the non-native tables. Used in the {@link 
org.apache.hadoop.hive.ql.exec.MoveTask}.
  * @param commitProperties Commit properties which are needed for the 
handler based commit
  * @param overwrite If this is an INSERT OVERWRITE then it is true
  * @throws HiveException If there is an error during commit
  */
 default void storageHandlerCommit(Properties commitProperties, boolean 
overwrite) throws HiveException {
   throw new UnsupportedOperationException();
 }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609547)
Time Spent: 1h 50m  (was: 1h 40m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609545
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 08:01
Start Date: 10/Jun/21 08:01
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648945402



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -317,6 +320,36 @@ public int execute() {
 }
 
 try (LocalTableLock lock = 
acquireLockForFileMove(work.getLoadTableWork())) {
+  String storageHandlerClass = null;

Review comment:
   Would it make sense to move this newly-added code into a method? e.g. 
`boolean checkAndCommitNatively(work)` or something like that, with some 
javadoc. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609545)
Time Spent: 1h 40m  (was: 1.5h)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609544
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 08:00
Start Date: 10/Jun/21 08:00
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #2376:
URL: https://github.com/apache/hive/pull/2376#issuecomment-858403272


   > Thanks for the quick fix @maheshk114!
   > 
   > Left some comments on the code.
   > Also we could definitely use some additional tests which could make sure 
that someone else later does not break this functionality.
   > 
   > Thanks,
   > Peter
   
   Thanks Peter, for the review. I have added the tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609544)
Time Spent: 1h  (was: 50m)

> Update column stat throws NPE if direct sql is disabled
> ---
>
> Key: HIVE-25225
> URL: https://issues.apache.org/jira/browse/HIVE-25225
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In case direct sql is disabled, the MetaStoreDirectSql object is not 
> initialised and thats causing NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25232) Update Hive syntax to use plural form of time based partition transforms

2021-06-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25232:



> Update Hive syntax to use plural form of time based partition transforms 
> -
>
> Key: HIVE-25232
> URL: https://issues.apache.org/jira/browse/HIVE-25232
> Project: Hive
>  Issue Type: Task
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> We should follow the [SparkSQL 
> syntax|https://iceberg.apache.org/spark-ddl/#partitioned-by] when defining 
> partition transform for Iceberg tables. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25225) Update column stat throws NPE if direct sql is disabled

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25225?focusedWorklogId=609543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609543
 ]

ASF GitHub Bot logged work on HIVE-25225:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:59
Start Date: 10/Jun/21 07:59
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2376:
URL: https://github.com/apache/hive/pull/2376#discussion_r648942867



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -7006,16 +7006,40 @@ private boolean updatePartitonColStatsInternal(Table 
tbl, ColumnStatistics colSt
 + " part=" + csd.getPartName());
 
 boolean ret = false;
+
+Map parameters;
+List partVals;
+boolean committed = false;
+getMS().openTransaction();
+
 try {
   if (tbl == null) {
 tbl = getTable(catName, dbName, tableName);
   }
-  ret = updatePartitionColStatsInBatch(tbl, 
Collections.singletonMap(csd.getPartName(), colStats),
-  validWriteIds, writeId);
+  partVals = getPartValsFromName(tbl, csd.getPartName());
+  parameters = getMS().updatePartitionColumnStatistics(colStats, partVals, 
validWriteIds, writeId);
+  if (parameters != null) {
+if (transactionalListeners != null && 
!transactionalListeners.isEmpty()) {
+  MetaStoreListenerNotifier.notifyEvent(transactionalListeners,
+  EventType.UPDATE_PARTITION_COLUMN_STAT,
+  new UpdatePartitionColumnStatEvent(colStats, partVals, 
parameters, tbl,
+  writeId, this));
+}
+if (!listeners.isEmpty()) {

Review comment:
   This is the old code so i have kept it as it is. Other methods are also 
using it same way.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -9037,8 +9061,16 @@ public boolean 
set_aggr_stats_for(SetPartitionsStatsRequest request) throws TExc
   } else { // No merge.
 Table t = getTable(catName, dbName, tableName);
 // We don't short-circuit on errors here anymore. That can leave acid 
stats invalid.
-ret = updatePartitionColStatsInBatch(t, newStatsMap,
-request.getValidWriteIdList(), request.getWriteId());
+if (MetastoreConf.getBoolVar(getConf(), ConfVars.TRY_DIRECT_SQL)) {

Review comment:
   I think for get operations it's fine. But for set operations we have to 
start a transaction in HMS handler and for direct sql flow that may cause issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609543)
Time Spent: 50m  (was: 40m)

> Update column stat throws NPE if direct sql is disabled
> ---
>
> Key: HIVE-25225
> URL: https://issues.apache.org/jira/browse/HIVE-25225
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In case direct sql is disabled, the MetaStoreDirectSql object is not 
> initialised and thats causing NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609542
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:58
Start Date: 10/Jun/21 07:58
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648943168



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -7316,7 +7316,7 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   destTableId++;
   // Create the work for moving the table
   // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
-  if (!isNonNativeTable) {
+  if (!isNonNativeTable || 
destinationTable.getStorageHandler().useNativeCommit()) {

Review comment:
   Can you give some context why this is needed? Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609542)
Time Spent: 1.5h  (was: 1h 20m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609541
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:56
Start Date: 10/Jun/21 07:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648942046



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -317,6 +320,36 @@ public int execute() {
 }
 
 try (LocalTableLock lock = 
acquireLockForFileMove(work.getLoadTableWork())) {
+  String storageHandlerClass = null;
+  Properties commitProperties = null;
+  boolean overwrite = false;
+
+  if (work.getLoadTableWork() != null) {
+// Get the info from the table data
+TableDesc tableDesc = work.getLoadTableWork().getTable();
+storageHandlerClass = tableDesc.getProperties().getProperty(

Review comment:
   I think it should not be `null`, otherwise we will have a 
`NullPointerException` in `EXPLAIN ...` commands: 
https://github.com/apache/hive/blob/a97448f84167e4e8c3615908556fe2e4163a43ca/ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java#L165-L168




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609541)
Time Spent: 1h 20m  (was: 1h 10m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609539
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:55
Start Date: 10/Jun/21 07:55
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648941044



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -263,4 +264,23 @@ default boolean supportsPartitionTransform() {
   default String getFileFormatPropertyKey() {
 return null;
   }
+
+  /**
+   * Check if we should use the {@link #nativeCommit(Properties, boolean)} 
method for committing inserts instead of
+   * using file copy in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}-s.
+   * @return
+   */
+  default boolean useNativeCommit() {
+return false;
+  }
+
+  /**
+   * Commits the inserts for the non-native tables. Used in the {@link 
org.apache.hadoop.hive.ql.exec.MoveTask}.

Review comment:
   It's a bit strange that we use the term native commit but only for 
non-native tables :) But to be honest, I couldn't find a better name yet 
either, because `nativeCommit` still kinda makes sense :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609539)
Time Spent: 1h 10m  (was: 1h)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=609537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609537
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:51
Start Date: 10/Jun/21 07:51
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648937961



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
 return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
 getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-updateMetrics(NUM_OBSOLETE_DELTAS,
-obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, 
counters);
-updateMetrics(NUM_DELTAS,
-deltaCache, deltaTopN, deltasThreshold, counters);
-updateMetrics(NUM_SMALL_DELTAS,
-smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+if(acidMetricsExtEnabled) {
+  updateMetrics(NUM_OBSOLETE_DELTAS,
+  obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, 
counters);
+  updateMetrics(NUM_DELTAS,
+  deltaCache, deltaTopN, deltasThreshold, counters);
+  updateMetrics(NUM_SMALL_DELTAS,
+  smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+}
   }
 
-  public static void mergeDeltaFilesStats(AcidDirectory dir, long 
checkThresholdInSec,
-float deltaPctThreshold, EnumMap> deltaFilesStats) throws IOException {
-long baseSize = getBaseSize(dir);
-int numObsoleteDeltas = getNumObsoleteDeltas(dir, checkThresholdInSec);
+  public static void mergeDeltaFilesStats(AcidDirectory dir, long 
checkThresholdInSec, float deltaPctThreshold,
+  EnumMap> deltaFilesStats, 
Configuration conf) throws IOException {
+if (MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON)) {

Review comment:
   Ok. Btw using acidMetricsExtEnabled would make this more readable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609537)
Time Spent: 1h 20m  (was: 1h 10m)

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=609533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609533
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:50
Start Date: 10/Jun/21 07:50
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648937001



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -115,41 +117,45 @@ public static DeltaFilesMetricReporter getInstance() {
 return InstanceHolder.instance;
   }
 
-  public static synchronized void init(HiveConf conf){
+  public static synchronized void init(HiveConf conf) {
 getInstance().configure(conf);
   }
 
   public void submit(TezCounters counters) {
-updateMetrics(NUM_OBSOLETE_DELTAS,
-obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, 
counters);
-updateMetrics(NUM_DELTAS,
-deltaCache, deltaTopN, deltasThreshold, counters);
-updateMetrics(NUM_SMALL_DELTAS,
-smallDeltaCache, smallDeltaTopN, deltasThreshold, counters);
+if(acidMetricsExtEnabled) {

Review comment:
   Ok, you convinced me




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609533)
Time Spent: 1h 10m  (was: 1h)

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609534
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:50
Start Date: 10/Jun/21 07:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648937036



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) 
throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = 
getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+if (jobContext.isPresent()) {
+  try {
+OutputCommitter committer = new HiveIcebergOutputCommitter();
+committer.commitJob(jobContext.get());
+  } catch (Throwable e) {
+LOG.error("Error while trying to commit job, starting rollback", e);
+rollbackInsertTable(configuration, tableName, overwrite);
+throw new HiveException("Error committing job", e);

Review comment:
   Makes sense  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609534)
Time Spent: 1h  (was: 50m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609532
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:49
Start Date: 10/Jun/21 07:49
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648936862



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) 
throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = 
getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+if (jobContext.isPresent()) {
+  try {
+OutputCommitter committer = new HiveIcebergOutputCommitter();
+committer.commitJob(jobContext.get());
+  } catch (Throwable e) {
+LOG.error("Error while trying to commit job, starting rollback", e);
+rollbackInsertTable(configuration, tableName, overwrite);

Review comment:
   Ok. Did the full refactoring then




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609532)
Time Spent: 50m  (was: 40m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=609531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609531
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:48
Start Date: 10/Jun/21 07:48
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648935819



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -230,23 +240,26 @@ private static long getDirSize(AcidUtils.ParsedDirectory 
dir, FileSystem fs) thr
   .sum();
   }
 
-  private void configure(HiveConf conf){
-deltasThreshold = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD);
-obsoleteDeltasThreshold = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD);
-
-initMetricsCache(conf);
-long reportingInterval = HiveConf.getTimeVar(conf,
-HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, 
TimeUnit.SECONDS);
-
-ThreadFactory threadFactory =
-  new ThreadFactoryBuilder()
-.setDaemon(true)
-.setNameFormat("DeltaFilesMetricReporter %d")
-.build();
-executorService = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
-executorService.scheduleAtFixedRate(
-new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS);
-LOG.info("Started DeltaFilesMetricReporter thread");
+  private void configure(HiveConf conf) {
+acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON);
+if (acidMetricsExtEnabled) {

Review comment:
   This is only executed once when the HS2 starts up, and MSConf is 
accessed a lot when this happens anyway. I don't think exposing MSConf would be 
a problem




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609531)
Time Spent: 1h  (was: 50m)

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=609526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609526
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:45
Start Date: 10/Jun/21 07:45
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648933458



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -206,7 +216,7 @@ public static void backPropagateAcidMetrics(JobConf 
jobConf, Configuration conf)
   }
 
   public static void close() {
-if (getInstance() != null) {
+if (getInstance() != null && getInstance().acidMetricsExtEnabled) {

Review comment:
   Oh, then probably null checking the executorService would be best here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609526)
Time Spent: 50m  (was: 40m)

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609519
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:35
Start Date: 10/Jun/21 07:35
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648926231



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -317,6 +320,36 @@ public int execute() {
 }
 
 try (LocalTableLock lock = 
acquireLockForFileMove(work.getLoadTableWork())) {
+  String storageHandlerClass = null;
+  Properties commitProperties = null;
+  boolean overwrite = false;
+
+  if (work.getLoadTableWork() != null) {
+// Get the info from the table data
+TableDesc tableDesc = work.getLoadTableWork().getTable();
+storageHandlerClass = tableDesc.getProperties().getProperty(

Review comment:
   Can `tableDesc.getProperties()` be null?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609519)
Time Spent: 40m  (was: 0.5h)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25081) Put metrics collection behind a feature flag

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25081?focusedWorklogId=609513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609513
 ]

ASF GitHub Bot logged work on HIVE-25081:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:27
Start Date: 10/Jun/21 07:27
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2332:
URL: https://github.com/apache/hive/pull/2332#discussion_r648920936



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -454,6 +454,8 @@ public static ConfVars getMetaConf(String name) {
 "hive.metastore.acidmetrics.check.interval", 300,
 TimeUnit.SECONDS,
 "Time in seconds between acid related metric collection runs."),
+METASTORE_ACIDMETRICS_EXT_ON("metastore.acidmetrics.ext.on", 
"hive.metastore.acidmetrics.ext.on", true,
+"Whether to collect additional acid related metrics outside of the 
acid metrics service."),

Review comment:
   Yes, right:)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609513)
Time Spent: 40m  (was: 0.5h)

> Put metrics collection behind a feature flag
> 
>
> Key: HIVE-25081
> URL: https://issues.apache.org/jira/browse/HIVE-25081
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Most metrics we're creating are collected in AcidMetricsService, which is 
> behind a feature flag. However there are some metrics that are collected 
> outside of the service. These should be behind a feature flag in addition to 
> hive.metastore.metrics.enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25222) Fix reading Iceberg tables with a comma in column names

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25222:
--
Labels: pull-request-available  (was: )

> Fix reading Iceberg tables with a comma in column names
> ---
>
> Key: HIVE-25222
> URL: https://issues.apache.org/jira/browse/HIVE-25222
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When using a table with a column name containing a comma (e.g. `employ,ee`), 
> reading an Iceberg table fails because we rely on the property 
> "hive.io.file.readcolumn.names" which encodes the read columns in a 
> comma-separated list, put together by the ColumnProjectionUtils class.
> Because it's comma-separated in all cases, it will produce a string like: 
> "id,birth_date,employ,ee" which can cause problems for Iceberg readers which 
> use this string list to construct their expected read schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25222) Fix reading Iceberg tables with a comma in column names

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25222?focusedWorklogId=609510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609510
 ]

ASF GitHub Bot logged work on HIVE-25222:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:25
Start Date: 10/Jun/21 07:25
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2368:
URL: https://github.com/apache/hive/pull/2368#discussion_r648912392



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java
##
@@ -201,8 +203,11 @@ public static Schema readSchema(Configuration conf) {
   }
 
   public static String[] selectedColumns(Configuration conf) {
-String[] readColumns = conf.getStrings(InputFormatConfig.SELECTED_COLUMNS);
-return readColumns != null && readColumns.length > 0 ? readColumns : null;
+String readColumns = conf.get(InputFormatConfig.SELECTED_COLUMNS);
+if (readColumns == null || readColumns.isEmpty()) {
+  return null;
+}

Review comment:
   nit: new line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609510)
Remaining Estimate: 0h
Time Spent: 10m

> Fix reading Iceberg tables with a comma in column names
> ---
>
> Key: HIVE-25222
> URL: https://issues.apache.org/jira/browse/HIVE-25222
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When using a table with a column name containing a comma (e.g. `employ,ee`), 
> reading an Iceberg table fails because we rely on the property 
> "hive.io.file.readcolumn.names" which encodes the read columns in a 
> comma-separated list, put together by the ColumnProjectionUtils class.
> Because it's comma-separated in all cases, it will produce a string like: 
> "id,birth_date,employ,ee" which can cause problems for Iceberg readers which 
> use this string list to construct their expected read schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25231) Add an ability to migrate CSV generated to hive table in replstats

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25231:
--
Labels: pull-request-available  (was: )

> Add an ability to migrate CSV generated to hive table in replstats
> --
>
> Key: HIVE-25231
> URL: https://issues.apache.org/jira/browse/HIVE-25231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add an option to replstats.sh to load the CSV generated using the replication 
> policy into a hive table/view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25231) Add an ability to migrate CSV generated to hive table in replstats

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25231?focusedWorklogId=609507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609507
 ]

ASF GitHub Bot logged work on HIVE-25231:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:22
Start Date: 10/Jun/21 07:22
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2379:
URL: https://github.com/apache/hive/pull/2379


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609507)
Remaining Estimate: 0h
Time Spent: 10m

> Add an ability to migrate CSV generated to hive table in replstats
> --
>
> Key: HIVE-25231
> URL: https://issues.apache.org/jira/browse/HIVE-25231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add an option to replstats.sh to load the CSV generated using the replication 
> policy into a hive table/view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609506
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:21
Start Date: 10/Jun/21 07:21
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648916479



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) 
throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = 
getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+if (jobContext.isPresent()) {
+  try {
+OutputCommitter committer = new HiveIcebergOutputCommitter();
+committer.commitJob(jobContext.get());
+  } catch (Throwable e) {
+LOG.error("Error while trying to commit job, starting rollback", e);
+rollbackInsertTable(configuration, tableName, overwrite);
+throw new HiveException("Error committing job", e);

Review comment:
   Might be worth including the jobID and tableName into the exception 
message here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609506)
Time Spent: 0.5h  (was: 20m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25208) Refactor Iceberg commit to the MoveTask/MoveWork

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25208?focusedWorklogId=609505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609505
 ]

ASF GitHub Bot logged work on HIVE-25208:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:20
Start Date: 10/Jun/21 07:20
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648915562



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
 return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) 
throws HiveException {
+String tableName = commitProperties.getProperty(Catalogs.NAME);
+Configuration configuration = SessionState.getSessionConf();
+Optional jobContext = 
getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+if (jobContext.isPresent()) {
+  try {
+OutputCommitter committer = new HiveIcebergOutputCommitter();
+committer.commitJob(jobContext.get());
+  } catch (Throwable e) {
+LOG.error("Error while trying to commit job, starting rollback", e);
+rollbackInsertTable(configuration, tableName, overwrite);

Review comment:
   Maybe it's enough to pass in the `jobContext` to this method?  
`getJobContextForCommitOrAbort` should give back the same result on the second 
invocation too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609505)
Time Spent: 20m  (was: 10m)

> Refactor Iceberg commit to the MoveTask/MoveWork
> 
>
> Key: HIVE-25208
> URL: https://issues.apache.org/jira/browse/HIVE-25208
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Instead of committing Iceberg changes in `DefaultMetaHook.preCommitInsert` we 
> should commit in MoveWork so we are using the same flow as normal tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25230) add position and occurrence to instr()

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25230?focusedWorklogId=609504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609504
 ]

ASF GitHub Bot logged work on HIVE-25230:
-

Author: ASF GitHub Bot
Created on: 10/Jun/21 07:19
Start Date: 10/Jun/21 07:19
Worklog Time Spent: 10m 
  Work Description: stiga-huang opened a new pull request #2378:
URL: https://github.com/apache/hive/pull/2378


   
   
   ### What changes were proposed in this pull request?
   
   This PR extends the INSTR() function to support optional position and 
occurrence arguments, which is supported in Oracle, Impala, etc.
   https://docs.oracle.com/database/121/SQLRF/functions089.htm#SQLRF00651
   
https://impala.apache.org/docs/build/html/topics/impala_string_functions.html#string_functions__instr
   
   ### Why are the changes needed?
   
   This provides users more functionality in the INSTR() function.
   It also helps to reduce the SQL difference between Hive and Impala.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, INSTR() will have two more optional arguments, which needs document as 
well.
   ```
   instr(str, substr[, pos[, occurrence]]) - Returns the index of the given 
occurrence of substr in str after position pos
   
   pos is a 1-based index. If pos < 0, the starting position is
   determined by counting backwards from the end of str and then Hive
   searches backward from the resulting position.
   occurrence is also a 1-based index. The value must be positive.
   If occurrence is greater than the number of matching occurrences,
   the function returns 0.
   If either of the optional arguments, pos or occurrence, is NULL,
   the function also returns NULL.
   Example:
 > SELECT instr('Facebook', 'boo') FROM src LIMIT 1;
 5
 > SELECT instr('CORPORATE FLOOR','OR', 3, 2) FROM src LIMIT 1;
 14  
 > SELECT instr('CORPORATE FLOOR','OR', -3, 2) FROM src LIMIT 1;
 2
   ```
   
   ### How was this patch tested?
   
   Tests were added in udf_instr_wrong_args_len2.q, 
udf_instr_wrong_occurrence.q, and udf_instr.q.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 609504)
Remaining Estimate: 0h
Time Spent: 10m

> add position and occurrence to instr()
> --
>
> Key: HIVE-25230
> URL: https://issues.apache.org/jira/browse/HIVE-25230
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current instr() only supports two arguments:
> {code:java}
> instr(str, substr) - Returns the index of the first occurance of substr in str
> {code}
> Other systems (Vertica, Oracle, Impala etc) support additional position and 
> occurrence arguments:
> {code:java}
> instr(str, substr[, pos[, occurrence]])
> {code}
> Oracle doc: 
> [https://docs.oracle.com/database/121/SQLRF/functions089.htm#SQLRF00651]
> It'd be nice to support this as well. Otherwise, it's a SQL difference 
> between Impala and Hive.
>  Impala supports this in IMPALA-3973



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25230) add position and occurrence to instr()

2021-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25230:
--
Labels: pull-request-available  (was: )

> add position and occurrence to instr()
> --
>
> Key: HIVE-25230
> URL: https://issues.apache.org/jira/browse/HIVE-25230
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current instr() only supports two arguments:
> {code:java}
> instr(str, substr) - Returns the index of the first occurance of substr in str
> {code}
> Other systems (Vertica, Oracle, Impala etc) support additional position and 
> occurrence arguments:
> {code:java}
> instr(str, substr[, pos[, occurrence]])
> {code}
> Oracle doc: 
> [https://docs.oracle.com/database/121/SQLRF/functions089.htm#SQLRF00651]
> It'd be nice to support this as well. Otherwise, it's a SQL difference 
> between Impala and Hive.
>  Impala supports this in IMPALA-3973



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25231) Add an ability to migrate CSV generated to hive table in replstats

2021-06-10 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25231:
---


> Add an ability to migrate CSV generated to hive table in replstats
> --
>
> Key: HIVE-25231
> URL: https://issues.apache.org/jira/browse/HIVE-25231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Add an option to replstats.sh to load the CSV generated using the replication 
> policy into a hive table/view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)