[jira] [Commented] (HIVE-25198) CTAS external table with camelcase and HMS translation ON is returning 0 records

2021-06-03 Thread Rajkumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357063#comment-17357063
 ] 

Rajkumar Singh commented on HIVE-25198:
---

[~nareshpr] I think this can be taken care by 
https://issues.apache.org/jira/browse/HIVE-24951

> CTAS external table with camelcase and HMS translation ON is returning 0 
> records
> 
>
> Key: HIVE-25198
> URL: https://issues.apache.org/jira/browse/HIVE-25198
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> create external table TarGet as select * from source;
> Above query creates tableLocation with CamelCase if HMS Translation is ON, 
> whereas MoveTask will use lowerCase table path.
> eg., 
> {code:java}
> ==> Desc formatted target <==
> Location:  hdfs:///warehouse/tablespace/external/hive/test.db/TarGet
> ==> MoveTask <==
> INFO : Moving data to directory 
> hdfs:///warehouse/tablespace/external/hive/test.db/target from 
> hdfs:///warehouse/tablespace/external/hive/test.db/.hive-staging_hive_2021-06-04_03-02-36_272_669287187808252905-12/-ext-10002
> ==> HMS Translation <==
> 2021-06-04 03:02:45,772 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [pool-7-thread-8]: Transformer returning table:Table(tableName:TarGet, 
> dbName:test, owner:hive, createTime:1622775765, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, 
> type:varchar(10), comment:null)], location: 
> hdfs:///warehouse/tablespace/external/hive/test.db/TarGet,{code}
> After CTAS, Select query on target table will return 0 rows.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25198) CTAS external table with camelcase and HMS translation ON is returning 0 records

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25198:
--
Labels: pull-request-available  (was: )

> CTAS external table with camelcase and HMS translation ON is returning 0 
> records
> 
>
> Key: HIVE-25198
> URL: https://issues.apache.org/jira/browse/HIVE-25198
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> create external table TarGet as select * from source;
> Above query creates tableLocation with CamelCase if HMS Translation is ON, 
> whereas MoveTask will use lowerCase table path.
> eg., 
> {code:java}
> ==> Desc formatted target <==
> Location:  hdfs:///warehouse/tablespace/external/hive/test.db/TarGet
> ==> MoveTask <==
> INFO : Moving data to directory 
> hdfs:///warehouse/tablespace/external/hive/test.db/target from 
> hdfs:///warehouse/tablespace/external/hive/test.db/.hive-staging_hive_2021-06-04_03-02-36_272_669287187808252905-12/-ext-10002
> ==> HMS Translation <==
> 2021-06-04 03:02:45,772 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [pool-7-thread-8]: Transformer returning table:Table(tableName:TarGet, 
> dbName:test, owner:hive, createTime:1622775765, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, 
> type:varchar(10), comment:null)], location: 
> hdfs:///warehouse/tablespace/external/hive/test.db/TarGet,{code}
> After CTAS, Select query on target table will return 0 rows.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25198) CTAS external table with camelcase and HMS translation ON is returning 0 records

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25198?focusedWorklogId=606350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606350
 ]

ASF GitHub Bot logged work on HIVE-25198:
-

Author: ASF GitHub Bot
Created on: 04/Jun/21 03:17
Start Date: 04/Jun/21 03:17
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #2350:
URL: https://github.com/apache/hive/pull/2350


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606350)
Remaining Estimate: 0h
Time Spent: 10m

> CTAS external table with camelcase and HMS translation ON is returning 0 
> records
> 
>
> Key: HIVE-25198
> URL: https://issues.apache.org/jira/browse/HIVE-25198
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> create external table TarGet as select * from source;
> Above query creates tableLocation with CamelCase if HMS Translation is ON, 
> whereas MoveTask will use lowerCase table path.
> eg., 
> {code:java}
> ==> Desc formatted target <==
> Location:  hdfs:///warehouse/tablespace/external/hive/test.db/TarGet
> ==> MoveTask <==
> INFO : Moving data to directory 
> hdfs:///warehouse/tablespace/external/hive/test.db/target from 
> hdfs:///warehouse/tablespace/external/hive/test.db/.hive-staging_hive_2021-06-04_03-02-36_272_669287187808252905-12/-ext-10002
> ==> HMS Translation <==
> 2021-06-04 03:02:45,772 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [pool-7-thread-8]: Transformer returning table:Table(tableName:TarGet, 
> dbName:test, owner:hive, createTime:1622775765, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, 
> type:varchar(10), comment:null)], location: 
> hdfs:///warehouse/tablespace/external/hive/test.db/TarGet,{code}
> After CTAS, Select query on target table will return 0 rows.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25198) CTAS external table with camelcase and HMS translation ON is returning 0 records

2021-06-03 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-25198:
--
Description: 
create external table TarGet as select * from source;

Above query creates tableLocation with CamelCase if HMS Translation is ON, 
whereas MoveTask will use lowerCase table path.

eg., 
{code:java}
==> Desc formatted target <==
Location:  hdfs:///warehouse/tablespace/external/hive/test.db/TarGet

==> MoveTask <==
INFO : Moving data to directory 
hdfs:///warehouse/tablespace/external/hive/test.db/target from 
hdfs:///warehouse/tablespace/external/hive/test.db/.hive-staging_hive_2021-06-04_03-02-36_272_669287187808252905-12/-ext-10002

==> HMS Translation <==
2021-06-04 03:02:45,772 INFO  
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
[pool-7-thread-8]: Transformer returning table:Table(tableName:TarGet, 
dbName:test, owner:hive, createTime:1622775765, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:varchar(10), 
comment:null)], location: 
hdfs:///warehouse/tablespace/external/hive/test.db/TarGet,{code}
After CTAS, Select query on target table will return 0 rows.

 

  was:
create external table TarGet as select * from source;

Above query creates tableLocation with CamelCase if HMS Translation is ON, 
whereas MoveTask will use lowerCase table path.

eg., 

 
{code:java}
==> Desc formatted target <==
Location:  hdfs:///warehouse/tablespace/external/hive/test.db/TarGet

==> MoveTask <==
INFO : Moving data to directory 
hdfs:///warehouse/tablespace/external/hive/test.db/target from 
hdfs:///warehouse/tablespace/external/hive/test.db/.hive-staging_hive_2021-06-04_03-02-36_272_669287187808252905-12/-ext-10002

==> HMS Translation <==
2021-06-04 03:02:45,772 INFO  
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
[pool-7-thread-8]: Transformer returning table:Table(tableName:TarGet, 
dbName:test, owner:hive, createTime:1622775765, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:varchar(10), 
comment:null)], location: 
hdfs:///warehouse/tablespace/external/hive/ajay.db/TarGet,{code}
 

Select query after CTAS will return 0 rows because of this.

 


> CTAS external table with camelcase and HMS translation ON is returning 0 
> records
> 
>
> Key: HIVE-25198
> URL: https://issues.apache.org/jira/browse/HIVE-25198
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> create external table TarGet as select * from source;
> Above query creates tableLocation with CamelCase if HMS Translation is ON, 
> whereas MoveTask will use lowerCase table path.
> eg., 
> {code:java}
> ==> Desc formatted target <==
> Location:  hdfs:///warehouse/tablespace/external/hive/test.db/TarGet
> ==> MoveTask <==
> INFO : Moving data to directory 
> hdfs:///warehouse/tablespace/external/hive/test.db/target from 
> hdfs:///warehouse/tablespace/external/hive/test.db/.hive-staging_hive_2021-06-04_03-02-36_272_669287187808252905-12/-ext-10002
> ==> HMS Translation <==
> 2021-06-04 03:02:45,772 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [pool-7-thread-8]: Transformer returning table:Table(tableName:TarGet, 
> dbName:test, owner:hive, createTime:1622775765, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, 
> type:varchar(10), comment:null)], location: 
> hdfs:///warehouse/tablespace/external/hive/test.db/TarGet,{code}
> After CTAS, Select query on target table will return 0 rows.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25198) CTAS external table with camelcase and HMS translation ON is returning 0 records

2021-06-03 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-25198:
--
Summary: CTAS external table with camelcase and HMS translation ON is 
returning 0 records  (was: CTAS external table with camelcase & HMS translation 
ON is returning 0 records)

> CTAS external table with camelcase and HMS translation ON is returning 0 
> records
> 
>
> Key: HIVE-25198
> URL: https://issues.apache.org/jira/browse/HIVE-25198
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> create external table TarGet as select * from source;
> Above query creates tableLocation with CamelCase if HMS Translation is ON, 
> whereas MoveTask will use lowerCase table path.
> eg., 
>  
> {code:java}
> ==> Desc formatted target <==
> Location:  hdfs:///warehouse/tablespace/external/hive/test.db/TarGet
> ==> MoveTask <==
> INFO : Moving data to directory 
> hdfs:///warehouse/tablespace/external/hive/test.db/target from 
> hdfs:///warehouse/tablespace/external/hive/test.db/.hive-staging_hive_2021-06-04_03-02-36_272_669287187808252905-12/-ext-10002
> ==> HMS Translation <==
> 2021-06-04 03:02:45,772 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [pool-7-thread-8]: Transformer returning table:Table(tableName:TarGet, 
> dbName:test, owner:hive, createTime:1622775765, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, 
> type:varchar(10), comment:null)], location: 
> hdfs:///warehouse/tablespace/external/hive/ajay.db/TarGet,{code}
>  
> Select query after CTAS will return 0 rows because of this.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25198) CTAS external table with camelcase & HMS translation ON is returning 0 records

2021-06-03 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R reassigned HIVE-25198:
-


> CTAS external table with camelcase & HMS translation ON is returning 0 records
> --
>
> Key: HIVE-25198
> URL: https://issues.apache.org/jira/browse/HIVE-25198
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> create external table TarGet as select * from source;
> Above query creates tableLocation with CamelCase if HMS Translation is ON, 
> whereas MoveTask will use lowerCase table path.
> eg., 
>  
> {code:java}
> ==> Desc formatted target <==
> Location:  hdfs:///warehouse/tablespace/external/hive/test.db/TarGet
> ==> MoveTask <==
> INFO : Moving data to directory 
> hdfs:///warehouse/tablespace/external/hive/test.db/target from 
> hdfs:///warehouse/tablespace/external/hive/test.db/.hive-staging_hive_2021-06-04_03-02-36_272_669287187808252905-12/-ext-10002
> ==> HMS Translation <==
> 2021-06-04 03:02:45,772 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [pool-7-thread-8]: Transformer returning table:Table(tableName:TarGet, 
> dbName:test, owner:hive, createTime:1622775765, lastAccessTime:0, 
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, 
> type:varchar(10), comment:null)], location: 
> hdfs:///warehouse/tablespace/external/hive/ajay.db/TarGet,{code}
>  
> Select query after CTAS will return 0 rows because of this.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=606345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606345
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 04/Jun/21 02:40
Start Date: 04/Jun/21 02:40
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-854315065


   Hey @belugabehr, cloud you please take a look at the changes? thanks :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606345)
Time Spent: 5h 40m  (was: 5.5h)

> Show operation log at webui
> ---
>
> Key: HIVE-24802
> URL: https://issues.apache.org/jira/browse/HIVE-24802
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: operationlog.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently we provide getQueryLog in HiveStatement to fetch the operation log, 
>  and the operation log would be deleted on operation closing(delay for the 
> canceled operation).  Sometimes it's would be not easy for the user(jdbc) or 
> administrators to deep into the details of the finished(failed) operation, so 
> we present the operation log on webui and keep the operation log for some 
> time for latter analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=606343=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606343
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 04/Jun/21 02:33
Start Date: 04/Jun/21 02:33
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2218:
URL: https://github.com/apache/hive/pull/2218#issuecomment-854312440


   Hey @belugabehr,  
[HIVE-25126](https://issues.apache.org/jira/browse/HIVE-25126) seems need a lot 
of changes, maybe we should push it step by step to accomplish that,  what do 
you think? 
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606343)
Time Spent: 2h 50m  (was: 2h 40m)

> Improve the exception handling in HMSHandler
> 
>
> Key: HIVE-25055
> URL: https://issues.apache.org/jira/browse/HIVE-25055
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25055) Improve the exception handling in HMSHandler

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25055?focusedWorklogId=606341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606341
 ]

ASF GitHub Bot logged work on HIVE-25055:
-

Author: ASF GitHub Bot
Created on: 04/Jun/21 02:25
Start Date: 04/Jun/21 02:25
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2218:
URL: https://github.com/apache/hive/pull/2218#discussion_r645246670



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -914,6 +914,7 @@ private boolean isViewTable(String catName, String dbName, 
String tblName) throw
 long queryTime = doTrace ? System.nanoTime() : 0;
 MetastoreDirectSqlUtils.timingTrace(doTrace, queryText, start, queryTime);
 if (sqlResult.isEmpty()) {
+  query.closeAll();

Review comment:
   Moved to https://github.com/apache/hive/pull/2344




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606341)
Time Spent: 2h 40m  (was: 2.5h)

> Improve the exception handling in HMSHandler
> 
>
> Key: HIVE-25055
> URL: https://issues.apache.org/jira/browse/HIVE-25055
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25197) logger level doesn't effect when set on hive-cli starting command

2021-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357018#comment-17357018
 ] 

Zhihua Deng commented on HIVE-25197:


Clould you please try with "-hiveconf hive.root.logger=ERROR,console" ?

> logger level doesn't effect when set on hive-cli starting command
> -
>
> Key: HIVE-25197
> URL: https://issues.apache.org/jira/browse/HIVE-25197
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 3.1.2
> Environment: hive: 3.1.2
>Reporter: Spongebob
>Priority: Minor
> Attachments: image-2021-06-04-09-43-21-502.png
>
>
> I am trying to start hive via " hive --hiveconf hive.root.logger=ERROR,DRFA" 
> but it doesn't effect, how can I hide these info log when using hive cli ?
> !image-2021-06-04-09-43-21-502.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-25188.

Resolution: Not A Problem

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=606333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606333
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 04/Jun/21 01:51
Start Date: 04/Jun/21 01:51
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r645236553



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -210,27 +213,34 @@ private void stopWorkers() {
 }
   }
 
-  private List processOneTable(TableName fullTableName)
+  private List processOneTable(TableName fullTableName, 
Map dbsToSkip)
   throws MetaException, NoSuchTxnException, NoSuchObjectException {
 if (isAnalyzeTableInProgress(fullTableName)) return null;
 String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = 
fullTableName.getTable();
+String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf);
+if (!dbsToSkip.containsKey(dbName)) {
+  Database database = rs.getDatabase(cat, db);
+  boolean skipDb = false;
+  if (MetaStoreUtils.isDbBeingFailedOver(database)) {
+skipDb = true;
+LOG.info("Skipping all the tables which belong to database: {} as it 
is being failed over", db);
+  } else if (ReplUtils.isTargetOfReplication(database)) {

Review comment:
   We already had two separate methods declared in both ReplUtils and 
PartitionManagementTask because ReplUtils package is not accessible in 
MetastoreThreads. But it can be moved to MetastoreUtils where it can be 
accessible by both of them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606333)
Time Spent: 4h  (was: 3h 50m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357000#comment-17357000
 ] 

Zhihua Deng edited comment on HIVE-25188 at 6/4/21, 1:24 AM:
-

{quote}The "data" field is not a valid JSON String type and therefore we should 
not allow this type of interaction. 
{quote}
That's a good clarification of this, thanks, close the pr.


was (Author: dengzh):
{quote}

The "data" field is not a valid JSON String type and therefore we should not 
allow this type of interaction. 

{quote}

That's a good clarify of this, thanks, close the pr.

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357000#comment-17357000
 ] 

Zhihua Deng commented on HIVE-25188:


{quote}

The "data" field is not a valid JSON String type and therefore we should not 
allow this type of interaction. 

{quote}

That's a good clarify of this, thanks, close the pr.

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356997#comment-17356997
 ] 

Zhihua Deng edited comment on HIVE-25188 at 6/4/21, 1:18 AM:
-

Thank you for the comments, [~belugabehr]!  Agreed that "too lenient" cloud 
bring some problems, the main concern is that compared the maintenance 
afterwards, it's not worth solving/improving the problem. The Jira is just to 
solve the read of nested json value, no more features, complex logic or config 
are introduced.

Thanks, Zhihua Deng


was (Author: dengzh):
Thank you for the comments, [~belugabehr]!  Agreed that "too lenient" cloud 
bring some problems, the main concern is that compared the maintenance 
afterwards, it's not worth solving/improving the problem. The Jira is just to 
solve the read of nested json value, no more features or complex logic are 
introduced. 

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356997#comment-17356997
 ] 

Zhihua Deng commented on HIVE-25188:


Thank you for the comments, [~belugabehr]!  Agreed that "too lenient" cloud 
bring some problems, the main concern is that compared the maintenance 
afterwards, it's not worth solving/improving the problem. The Jira is just to 
solve the read of nested json value, no more features or complex logic are 
introduced. 

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25192) No need to create table directory for the non-native table

2021-06-03 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356984#comment-17356984
 ] 

Zhihua Deng commented on HIVE-25192:


Seems we have used the "path" for table statistics and some operator(fetch), 
close the pr to future investigation of the problem...

> No need to create table directory for the non-native table
> --
>
> Key: HIVE-25192
> URL: https://issues.apache.org/jira/browse/HIVE-25192
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When creating non-native tables like kudu, hbase and so on,  we always create 
> a warehouse location for these tables, though these tables may not use the 
> location to store data or for job plan, so there is no need to create such 
> location. 
> We also should skip getting the input summary of non-native tables in some 
> cases, this will avoid oom problem of building the hash table when the 
> non-native table is on build side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25188?focusedWorklogId=606312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606312
 ]

ASF GitHub Bot logged work on HIVE-25188:
-

Author: ASF GitHub Bot
Created on: 04/Jun/21 00:22
Start Date: 04/Jun/21 00:22
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #2341:
URL: https://github.com/apache/hive/pull/2341


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606312)
Time Spent: 20m  (was: 10m)

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25192) No need to create table directory for the non-native table

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25192?focusedWorklogId=606313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606313
 ]

ASF GitHub Bot logged work on HIVE-25192:
-

Author: ASF GitHub Bot
Created on: 04/Jun/21 00:22
Start Date: 04/Jun/21 00:22
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #2346:
URL: https://github.com/apache/hive/pull/2346


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606313)
Time Spent: 20m  (was: 10m)

> No need to create table directory for the non-native table
> --
>
> Key: HIVE-25192
> URL: https://issues.apache.org/jira/browse/HIVE-25192
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When creating non-native tables like kudu, hbase and so on,  we always create 
> a warehouse location for these tables, though these tables may not use the 
> location to store data or for job plan, so there is no need to create such 
> location. 
> We also should skip getting the input summary of non-native tables in some 
> cases, this will avoid oom problem of building the hash table when the 
> non-native table is on build side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=606133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606133
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 19:05
Start Date: 03/Jun/21 19:05
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r645057607



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -210,27 +213,34 @@ private void stopWorkers() {
 }
   }
 
-  private List processOneTable(TableName fullTableName)
+  private List processOneTable(TableName fullTableName, 
Map dbsToSkip)
   throws MetaException, NoSuchTxnException, NoSuchObjectException {
 if (isAnalyzeTableInProgress(fullTableName)) return null;
 String cat = fullTableName.getCat(), db = fullTableName.getDb(), tbl = 
fullTableName.getTable();
+String dbName = MetaStoreUtils.prependCatalogToDbName(cat,db, conf);
+if (!dbsToSkip.containsKey(dbName)) {
+  Database database = rs.getDatabase(cat, db);
+  boolean skipDb = false;
+  if (MetaStoreUtils.isDbBeingFailedOver(database)) {
+skipDb = true;
+LOG.info("Skipping all the tables which belong to database: {} as it 
is being failed over", db);
+  } else if (ReplUtils.isTargetOfReplication(database)) {

Review comment:
   There is lot of code duplication between this and PartitionManagement. 
Can we make not achieve by having a single copy?
   Also,  why do we have two methods for isTargetOfReplication(), can we have 
just one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606133)
Time Spent: 3h 50m  (was: 3h 40m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=606086=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606086
 ]

ASF GitHub Bot logged work on HIVE-25194:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 17:43
Start Date: 03/Jun/21 17:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2348:
URL: https://github.com/apache/hive/pull/2348#discussion_r645003383



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java
##
@@ -81,11 +84,34 @@ public boolean fillStorageFormat(ASTNode child) throws 
SemanticException {
   }
   break;
 case HiveParser.TOK_STORAGEHANDLER:
-  storageHandler = processStorageHandler(child.getChild(0).getText());
-  if (child.getChildCount() == 2) {
-BaseSemanticAnalyzer.readProps(
-  (ASTNode) (child.getChild(1).getChild(0)),
-  serdeProps);
+  for (int i = 0; i < child.getChildCount(); i++) {
+ASTNode grandChild = (ASTNode) child.getChild(i);
+switch (grandChild.getToken().getType()) {
+  case HiveParser.TOK_FILEFORMAT_GENERIC:
+String fileFormatPropertyKey = null;
+try {
+  HiveStorageHandler handler = HiveUtils.getStorageHandler(conf, 
this.storageHandler);
+  fileFormatPropertyKey = handler.getFileFormatPropertyKey();
+} catch (HiveException e) {
+  throw new SemanticException("Failed to load storage handler:  " 
+ e.getMessage());
+}
+
+if (fileFormatPropertyKey != null) {
+  String fileFormat = grandChild.getChild(0).getText();
+  if (serdeProps.containsKey(fileFormatPropertyKey)) {
+throw new SemanticException("Provide only one of the 
following: STORED BY " + fileFormat +
+" or WITH SERDEPROPERTIES('" + fileFormatPropertyKey + 
"'='" + fileFormat + "')");
+  }
+
+  serdeProps.put(fileFormatPropertyKey, fileFormat);

Review comment:
   What happens if the fileformat is Text or some other valid  fileformat 
which is not supported by Iceberg? 

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java
##
@@ -81,11 +84,34 @@ public boolean fillStorageFormat(ASTNode child) throws 
SemanticException {
   }
   break;
 case HiveParser.TOK_STORAGEHANDLER:
-  storageHandler = processStorageHandler(child.getChild(0).getText());
-  if (child.getChildCount() == 2) {
-BaseSemanticAnalyzer.readProps(
-  (ASTNode) (child.getChild(1).getChild(0)),
-  serdeProps);
+  for (int i = 0; i < child.getChildCount(); i++) {

Review comment:
   What happens when StorageHandler is HBase and Stored By is also set? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606086)
Time Spent: 20m  (was: 10m)

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25188) JsonSerDe: Unable to read the string value from a nested json

2021-06-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356612#comment-17356612
 ] 

David Mollitor commented on HIVE-25188:
---

[~dengzh] As I understand the request, I am not in support of it.  The "data" 
field is not a valid JSON String type and therefore we should not allow this 
type of interaction.  Hive is already far too lenient it what it allows, which 
leads to break downs in testing, knowledge debt, and a larger testing surface 
area.  Just my opinion on the matter, maybe other disagree and can chime in.

> JsonSerDe: Unable to read the string value from a nested json
> -
>
> Key: HIVE-25188
> URL: https://issues.apache.org/jira/browse/HIVE-25188
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> create table json_table(data string, messageid string, publish_time bigint, 
> attributes string);
>  
> if the data of the table stored like:
> {code:java}
> {"data":{"H":{"event":"track_active","platform":"Android"},"B":{"device_type":"Phone","uuid":"[36ffec24-f6a4-4f5d-aa39-72e5513d2cae,11883bee-a7aa-4010-8a66-6c3c63a73f16]"}},"messageId":"2475185636801962","publish_time":1622514629783,"attributes":{"region":"IN"}}"}}{code}
> Exception will be thrown when trying to deserialize the data:
>  
> Caused by: java.lang.IllegalArgumentException
>  at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitLeafNode(HiveJsonReader.java:374)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:216)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitStructNode(HiveJsonReader.java:327)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.visitNode(HiveJsonReader.java:221)
>  at 
> org.apache.hadoop.hive.serde2.json.HiveJsonReader.parseStruct(HiveJsonReader.java:198)
>  at org.apache.hadoop.hive.serde2.JsonSerDe.deserialize(JsonSerDe.java:181)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25196) Native Vectorization of GenericUDFSplit function

2021-06-03 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-25196:
---


> Native Vectorization of GenericUDFSplit function
> 
>
> Key: HIVE-25196
> URL: https://issues.apache.org/jira/browse/HIVE-25196
> Project: Hive
>  Issue Type: Improvement
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
>
> Provide faster 'split' function for vector-mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25179) Support all partition transforms for Iceberg in create table

2021-06-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér updated HIVE-25179:
-
Description: 
Enhance table create syntax with support to partition transforms:
{code:sql}
CREATE TABLE ... PARTITIONED BY SPEC( year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(3, truncate_field), bucket(5, 
bucket_field bucket), identity_field ) STORED BY ICEBERG;
{code}


  was:
Enhance table create syntax with support to partition transforms:
{code:sql}
CREATE TABLE ... PARTITIONED BY SPEC(
year_field year,
month_field month,
day_field day,
hour_field hour,
truncate_field truncate[3],
bucket_field bucket[5],
identity_field identity
) STORED BY ICEBERG;
{code}



> Support all partition transforms for Iceberg in create table
> 
>
> Key: HIVE-25179
> URL: https://issues.apache.org/jira/browse/HIVE-25179
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Enhance table create syntax with support to partition transforms:
> {code:sql}
> CREATE TABLE ... PARTITIONED BY SPEC( year(year_field), month(month_field), 
> day(day_field), hour(hour_field), truncate(3, truncate_field), bucket(5, 
> bucket_field bucket), identity_field ) STORED BY ICEBERG;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25194:
--
Labels: pull-request-available  (was: )

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25194?focusedWorklogId=606023=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606023
 ]

ASF GitHub Bot logged work on HIVE-25194:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:59
Start Date: 03/Jun/21 15:59
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2348:
URL: https://github.com/apache/hive/pull/2348


   
   
   ### What changes were proposed in this pull request?
   Support `STORED AS fileformat` syntax together with `STORED BY ICEBERG` or 
`STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'`
   
   Work covered in this PR:
   - Enhance semantic rule to support both`STORED AS` and `STORED BY` together
   - Updated SemanticAnalyzer and StorageFormat to parse the new query format
   - Created q tests
   
   
   
   
   ### Why are the changes needed?
   Currently, the file format is configurable through the `TBLPROPERTIES` which 
can be inconvenient, if the end-user doesn't know which table property key 
should be used to specify the file format. 
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   New syntax:
   `CREATE TABLE ... STORED BY ICEBERG [WITH SERDEPROPERTIES()][STORED AS 
fileformat]`
   
   
   
   ### How was this patch tested?
   Manual test, q test
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606023)
Remaining Estimate: 0h
Time Spent: 10m

> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606019
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:52
Start Date: 03/Jun/21 15:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644915386



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -384,24 +382,28 @@ private void collectCommitInformation(TezWork work) 
throws IOException, TezExcep
   // get all target tables this vertex wrote to
   List tables = new ArrayList<>();
   for (Map.Entry entry : jobConf) {
-if (entry.getKey().startsWith("iceberg.mr.serialized.table.")) 
{
-  
tables.add(entry.getKey().substring("iceberg.mr.serialized.table.".length()));
+if 
(entry.getKey().startsWith(ICEBERG_SERIALIZED_TABLE_PREFIX)) {
+  
tables.add(entry.getKey().substring(ICEBERG_SERIALIZED_TABLE_PREFIX.length()));
 }
   }
-  // save information for each target table (jobID, task num, 
query state)
+  // find iceberg props in jobConf as they can be needed, but not 
available, during job commit
+  Map icebergProperties = new HashMap<>();
+  jobConf.forEach(e -> {
+// don't copy the serialized tables, they're not needed 
anymore and take up lots of space
+if (e.getKey().startsWith("iceberg.mr.") && 
!e.getKey().startsWith(ICEBERG_SERIALIZED_TABLE_PREFIX)) {
+  icebergProperties.put(e.getKey(), e.getValue());
+}
+  });
+  // save information for each target table (jobID, task num)
   for (String table : tables) {
-sessionConf.set(HIVE_TEZ_COMMIT_JOB_ID_PREFIX + table, 
jobIdStr);
-sessionConf.setInt(HIVE_TEZ_COMMIT_TASK_COUNT_PREFIX + table,
-status.getProgress().getSucceededTaskCount());
+SessionStateUtil.newCommitInfo(jobConf, table)

Review comment:
   This is a little bit odd to me.
   I mean I understand what did you do, but it still feels strange. Convince me 
 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606019)
Time Spent: 1.5h  (was: 1h 20m)

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606014
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:47
Start Date: 03/Jun/21 15:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644910919



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -384,24 +382,28 @@ private void collectCommitInformation(TezWork work) 
throws IOException, TezExcep
   // get all target tables this vertex wrote to
   List tables = new ArrayList<>();
   for (Map.Entry entry : jobConf) {
-if (entry.getKey().startsWith("iceberg.mr.serialized.table.")) 
{
-  
tables.add(entry.getKey().substring("iceberg.mr.serialized.table.".length()));
+if 
(entry.getKey().startsWith(ICEBERG_SERIALIZED_TABLE_PREFIX)) {
+  
tables.add(entry.getKey().substring(ICEBERG_SERIALIZED_TABLE_PREFIX.length()));
 }
   }
-  // save information for each target table (jobID, task num, 
query state)
+  // find iceberg props in jobConf as they can be needed, but not 
available, during job commit
+  Map icebergProperties = new HashMap<>();
+  jobConf.forEach(e -> {

Review comment:
   WE iterated through the jobConf a few lines ago. Might worth to consider 
to merge the loops, if everything else fails




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606014)
Time Spent: 1h 20m  (was: 1h 10m)

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606013
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:47
Start Date: 03/Jun/21 15:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644910919



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -384,24 +382,28 @@ private void collectCommitInformation(TezWork work) 
throws IOException, TezExcep
   // get all target tables this vertex wrote to
   List tables = new ArrayList<>();
   for (Map.Entry entry : jobConf) {
-if (entry.getKey().startsWith("iceberg.mr.serialized.table.")) 
{
-  
tables.add(entry.getKey().substring("iceberg.mr.serialized.table.".length()));
+if 
(entry.getKey().startsWith(ICEBERG_SERIALIZED_TABLE_PREFIX)) {
+  
tables.add(entry.getKey().substring(ICEBERG_SERIALIZED_TABLE_PREFIX.length()));
 }
   }
-  // save information for each target table (jobID, task num, 
query state)
+  // find iceberg props in jobConf as they can be needed, but not 
available, during job commit
+  Map icebergProperties = new HashMap<>();
+  jobConf.forEach(e -> {

Review comment:
   WE iterated through the jobConf a few lines ago. Might worth to consider 
to merge the loops, if all else fails




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606013)
Time Spent: 1h 10m  (was: 1h)

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606012
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:44
Start Date: 03/Jun/21 15:44
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644908593



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -384,24 +382,28 @@ private void collectCommitInformation(TezWork work) 
throws IOException, TezExcep
   // get all target tables this vertex wrote to
   List tables = new ArrayList<>();
   for (Map.Entry entry : jobConf) {

Review comment:
   Do we have a faster solution for this? The `jobConf` could be very-very 
big




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606012)
Time Spent: 1h  (was: 50m)

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606010
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:42
Start Date: 03/Jun/21 15:42
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644907549



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##
@@ -183,10 +183,9 @@ private void createTableForCTAS(Configuration 
configuration, Properties serDePro
 serDeProperties.get(Catalogs.NAME), tableSchema, 
serDeProperties.get(InputFormatConfig.PARTITION_SPEC));
 Catalogs.createTable(configuration, serDeProperties);
 
-// set these in the global conf so that we can rollback the table in the 
lifecycle hook in case of failures
-String queryId = configuration.get(HiveConf.ConfVars.HIVEQUERYID.varname);
-configuration.set(String.format(InputFormatConfig.IS_CTAS_QUERY_TEMPLATE, 
queryId), "true");
-
configuration.set(String.format(InputFormatConfig.CTAS_TABLE_NAME_TEMPLATE, 
queryId),
+// set these in the query state so that we can rollback the table in the 
lifecycle hook in case of failures
+SessionStateUtil.addResource(configuration, 
InputFormatConfig.IS_CTAS_QUERY, "true");

Review comment:
   Do we need both?
   Maybe if we have `CTAS_TABLE_NAME` then `IS_CTAS_QUERY` is `true`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606010)
Time Spent: 50m  (was: 40m)

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606006
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:36
Start Date: 03/Jun/21 15:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644899593



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -459,35 +454,35 @@ public void 
rollbackInsertTable(org.apache.hadoop.hive.metastore.api.Table table
   throws MetaException {
 String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
 JobContext jobContext = getJobContextForCommitOrAbort(tableName, 
overwrite);
-OutputCommitter committer = new HiveIcebergOutputCommitter();
-try {
-  LOG.info("rollbackInsertTable: Aborting job for jobID: {} and table: 
{}", jobContext.getJobID(), tableName);
-  committer.abortJob(jobContext, JobStatus.State.FAILED);
-} catch (IOException e) {
-  LOG.error("Error while trying to abort failed job. There might be 
uncleaned data files.", e);
-  // no throwing here because the original commitInsertTable exception 
should be propagated
-} finally {
-  // avoid config pollution with prefixed/suffixed keys
-  cleanCommitConfig(tableName);
+if (jobContext != null) {
+  OutputCommitter committer = new HiveIcebergOutputCommitter();
+  try {
+LOG.info("rollbackInsertTable: Aborting job for jobID: {} and table: 
{}", jobContext.getJobID(), tableName);
+committer.abortJob(jobContext, JobStatus.State.FAILED);
+  } catch (IOException e) {
+LOG.error("Error while trying to abort failed job. There might be 
uncleaned data files.", e);
+// no throwing here because the original commitInsertTable exception 
should be propagated
+  }
 }
   }
 
-  private void cleanCommitConfig(String tableName) {
-conf.unset(TezTask.HIVE_TEZ_COMMIT_JOB_ID_PREFIX + tableName);
-conf.unset(TezTask.HIVE_TEZ_COMMIT_TASK_COUNT_PREFIX + tableName);
-conf.unset(InputFormatConfig.SERIALIZED_TABLE_PREFIX + tableName);
-conf.unset(InputFormatConfig.OUTPUT_TABLES);
-  }
-
   private JobContext getJobContextForCommitOrAbort(String tableName, boolean 
overwrite) {

Review comment:
   Maybe Optional?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606006)
Time Spent: 40m  (was: 0.5h)

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606003
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:36
Start Date: 03/Jun/21 15:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644899221



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -437,19 +438,13 @@ public void 
commitInsertTable(org.apache.hadoop.hive.metastore.api.Table table,
   throws MetaException {
 String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
 JobContext jobContext = getJobContextForCommitOrAbort(tableName, 
overwrite);
-boolean failure = false;
-try {
-  OutputCommitter committer = new HiveIcebergOutputCommitter();
-  committer.commitJob(jobContext);
-} catch (Exception e) {
-  failure = true;
-  LOG.error("Error while trying to commit job", e);
-  throw new MetaException(StringUtils.stringifyException(e));
-} finally {
-  // if there's a failure, the configs will still be needed in 
rollbackInsertTable
-  if (!failure) {
-// avoid config pollution with prefixed/suffixed keys
-cleanCommitConfig(tableName);
+if (jobContext != null) {

Review comment:
   Oh... I found the reason behind it  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606003)
Time Spent: 0.5h  (was: 20m)

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=606001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-606001
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:33
Start Date: 03/Jun/21 15:33
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2347:
URL: https://github.com/apache/hive/pull/2347#discussion_r644897608



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
##
@@ -437,19 +438,13 @@ public void 
commitInsertTable(org.apache.hadoop.hive.metastore.api.Table table,
   throws MetaException {
 String tableName = TableIdentifier.of(table.getDbName(), 
table.getTableName()).toString();
 JobContext jobContext = getJobContextForCommitOrAbort(tableName, 
overwrite);
-boolean failure = false;
-try {
-  OutputCommitter committer = new HiveIcebergOutputCommitter();
-  committer.commitJob(jobContext);
-} catch (Exception e) {
-  failure = true;
-  LOG.error("Error while trying to commit job", e);
-  throw new MetaException(StringUtils.stringifyException(e));
-} finally {
-  // if there's a failure, the configs will still be needed in 
rollbackInsertTable
-  if (!failure) {
-// avoid config pollution with prefixed/suffixed keys
-cleanCommitConfig(tableName);
+if (jobContext != null) {

Review comment:
   Is this a bugfix?
   Do we expect null here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 606001)
Time Spent: 20m  (was: 10m)

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25191) Modernize Hive Thrift CLI Service Protocol

2021-06-03 Thread Matt McCline (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356510#comment-17356510
 ] 

Matt McCline commented on HIVE-25191:
-

[~thejas] Thank you very much for your comments! – studying.

> Modernize Hive Thrift CLI Service Protocol
> --
>
> Key: HIVE-25191
> URL: https://issues.apache.org/jira/browse/HIVE-25191
> Project: Hive
>  Issue Type: Improvement
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
>
> Unnecessary errors are occurring with the advent of proxy use such as 
> Gateways between the Hive client and Hive Server 2. Query failures can be due 
> to arbitrary proxy timeouts. This proposal avoids the timeouts by changing 
> the protocol to do regular polling. Currently, the Hive client uses one 
> request for the query compile request. Long query compile times make those 
> requests vulnerable to the arbitrary proxy timeouts.
> Another issue is Hive Server 2 sometimes does not notice the client has 
> failed or has lost interest in a potentially long running query. This causes 
> Hive locks and Big Data query resources to be held unnecessarily. The 
> assumption is the client issues a cancel query request when it gets an error. 
> This assumption does not always hold. If the proxy returned an error itself, 
> that proxy may reject the subsequent cancel request, too. And, if the client 
> is killed or the network is down, the client cannot complete a cancel 
> request. The proposed solution here is for Hive Server 2 to watch that the 
> client is sending regular polling requests for status. If a client ceases 
> those requests, then Hive Server 2 will cancel the query.
> Hive owns the JDBC path (i.e. HiveDriver). The ODBC path may be more 
> challenging because vendors provide ODBC drivers and Hive does not own the 
> ODBC protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?focusedWorklogId=605993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605993
 ]

ASF GitHub Bot logged work on HIVE-25195:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:17
Start Date: 03/Jun/21 15:17
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2347:
URL: https://github.com/apache/hive/pull/2347#issuecomment-853950016


   @pvary @lcspinter @szlta Can you please review this? Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605993)
Remaining Estimate: 0h
Time Spent: 10m

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25195:
--
Labels: pull-request-available  (was: )

> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25195) Store Iceberg write commit and ctas information in QueryState

2021-06-03 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25195:
-


> Store Iceberg write commit and ctas information in QueryState 
> --
>
> Key: HIVE-25195
> URL: https://issues.apache.org/jira/browse/HIVE-25195
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> We should replace the current method of passing Iceberg write commit-related 
> information (jobID, task num) and CTAS info via the session conf using 
> prefixed keys. We have a new way of doing that more cleanly, using the 
> QueryState object. This should make the code easier to maintain and guard 
> against accidental session conf pollution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24135) Drop database doesn't delete directory in managed location

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24135?focusedWorklogId=605976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605976
 ]

ASF GitHub Bot logged work on HIVE-24135:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 15:00
Start Date: 03/Jun/21 15:00
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #1506:
URL: https://github.com/apache/hive/pull/1506#discussion_r644869934



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -1917,6 +1923,12 @@ private void drop_database_core(RawStore ms, String 
catName,
 LOG.error("Failed to delete database directory: " + 
db.getLocationUri() +
 " " + e.getMessage());
   }
+  try {
+wh.deleteDir(wh.getDatabaseManagedPath(db), true, db);
+  } catch (Exception e) {
+LOG.error("Failed to delete database directory: " + 
db.getLocationUri() +

Review comment:
   I will rebase this patch. All this code has been refactored into 
HMSHandler.java. I will have to redo this fix. I will address it then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605976)
Time Spent: 50m  (was: 40m)

> Drop database doesn't delete directory in managed location
> --
>
> Key: HIVE-24135
> URL: https://issues.apache.org/jira/browse/HIVE-24135
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Repro:
>  say the default managed location is managed/hive and the default external 
> location is external/hive.
> {code:java}
> create database db1; -- creates: external/hive/db1.db
> create table db1.table1 (i int); -- creates: managed/hive/db1.db and  
> managed/hive/db1.db/table1
> drop database db1 cascade; -- removes : external/hive/db1.db and 
> managed/hive/db1.db/table1
> {code}
> Problem: Directory managed/hive/db1.db remains.
> Since HIVE-22995, dbs have a managed (managedLocationUri) and an external 
> location (locationUri). I think the issue is that 
> HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in 
> the external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25086) Create Ranger Deny Policy for replication db in all cases if hive.repl.ranger.target.deny.policy is set to true.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25086?focusedWorklogId=605969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605969
 ]

ASF GitHub Bot logged work on HIVE-25086:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 14:49
Start Date: 03/Jun/21 14:49
Worklog Time Spent: 10m 
  Work Description: hmangla98 commented on a change in pull request #2240:
URL: https://github.com/apache/hive/pull/2240#discussion_r644860679



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/RangerDenyTask.java
##
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.commons.collections.CollectionUtils;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.utils.SecurityUtils;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.ranger.RangerRestClient;
+import org.apache.hadoop.hive.ql.exec.repl.ranger.RangerRestClientImpl;
+import org.apache.hadoop.hive.ql.exec.repl.ranger.NoOpRangerRestClient;
+import org.apache.hadoop.hive.ql.exec.repl.ranger.RangerPolicy;
+import org.apache.hadoop.hive.ql.exec.repl.ranger.RangerExportPolicyList;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.metric.event.Status;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.net.URL;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * RangerDenyTask.
+ *
+ * Task to add Ranger Deny Policy
+ **/
+public class RangerDenyTask extends Task implements 
Serializable {
+private static final long serialVersionUID = 1L;
+
+private static final Logger LOG = 
LoggerFactory.getLogger(RangerDenyTask.class);
+
+private transient RangerRestClient rangerRestClient;
+
+public RangerDenyTask() {
+super();
+}
+
+@VisibleForTesting
+RangerDenyTask(final RangerRestClient rangerRestClient, final HiveConf 
conf, final RangerDenyWork work) {
+this.conf = conf;
+this.work = work;
+this.rangerRestClient = rangerRestClient;
+}
+
+@Override
+public String getName() {
+return "RANGER_DENY";
+}
+
+@Override
+public int execute() {
+try {
+LOG.info("Checking Ranger Deny Policy for {}", 
work.getTargetDbName());
+SecurityUtils.reloginExpiringKeytabUser();
+if (rangerRestClient == null) {
+rangerRestClient = getRangerRestClient();
+}
+URL url = work.getRangerConfigResource();
+if (url == null) {
+throw new 
SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE
+.format("Ranger configuration is not valid "
++ 
ReplUtils.RANGER_CONFIGURATION_RESOURCE_NAME, ReplUtils.REPL_RANGER_SERVICE));
+}
+conf.addResource(url);

Review comment:
   This will add existing ranger resource config. Properties of this 
resource will be override existing properties in conf.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605969)
Time Spent: 3h 40m  (was: 3.5h)

> Create Ranger Deny Policy for replication db in all cases if 
> hive.repl.ranger.target.deny.policy is set to true.
> 
>
> Key: HIVE-25086
> URL: 

[jira] [Work logged] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25189?focusedWorklogId=605961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605961
 ]

ASF GitHub Bot logged work on HIVE-25189:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 14:28
Start Date: 03/Jun/21 14:28
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera commented on a change in pull request 
#2342:
URL: https://github.com/apache/hive/pull/2342#discussion_r644842695



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CacheTableHelper.java
##
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Queue;
+import java.util.Set;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+
+import com.google.common.base.Preconditions;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.apache.hadoop.hive.ql.lockmgr.HiveTxnManager;
+import org.apache.hadoop.hive.ql.session.SessionState;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Class to help populate the cache at the beginning of query analysis. We 
would like
+ * to minimize the number of calls to fetch validWriteIdLists from the 
metastore. HMS
+ * has an API to request this object for multiple tables within one call, and 
this class
+ * uses that API.
+ *
+ * The sole purpose of this class is to help populate the HMS query cache. 
Nothing is returned
+ * from the public methods. In this way, if another method attempts to fetch a 
validWriteIdList,
+ * the SessionHiveMetaStoreClient query cache will contain the information.
+ *
+ * Because this class is only responsible for cache population, it is not a 
requirement for
+ * a caller to supply all the tables necessary for the query. It is also not a 
requirement
+ * for the tables to be part of the query. Of course, the query qill benefit 
if those
+ * conditions were true, but if the table is not in the cache, a later call 
fetching the writeids
+ * will hit the HMS server and will not fail.
+ *
+ * One tricky aspect to this class is that if a view is passed in, we want to 
fetch the
+ * validWriteIdLists for the underlying tables. At the beginning of the query, 
it is impossible
+ * to know the underlying tables without contacting HMS.
+ *
+ * In order to handle the underlying tables to the views, we keep a cache that 
holds our
+ * best guess. If we see a view in any query, we set up a server-wide cache 
that tracks
+ * the underlying tables to the view. If the view doesn't change, then this 
information
+ * will be accurate and allow us to fetch the underlying tables on our next 
query. If the
+ * view does change and the underlying tables are different, our fetch won't 
retrieve the
+ * correct information. But that's ok...remember what was said earlier that it 
is not
+ * a requirement for the tables to be part of the query. Later on in the 
query, this class
+ * will be called on the view level via the populateCacheForView call. At that 
point, if
+ * something changed, it will populate the cache with the newly detected 
tables. It will also
+ * change the underying table information for the view to optimize the next 
query using
+ * the view.
+ */
+public class CacheTableHelper {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(CacheTableHelper.class);
+
+  // Server wide cache used to hold what we currently think are the underlying 
tables
+  // for a view (which is the key). This information can go stale, but that's 
ok. The only
+  // repercussion of a stale view is that we will have to make an additional 
HMS call
+  // to retrieve the validWriteIdList for the changed tables.
+  // 

[jira] [Assigned] (HIVE-25194) Add support for STORED AS ORC/PARQUET/AVRO for Iceberg

2021-06-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25194:



> Add support for STORED AS ORC/PARQUET/AVRO for Iceberg
> --
>
> Key: HIVE-25194
> URL: https://issues.apache.org/jira/browse/HIVE-25194
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> Currently we have to specify the fileformat in TBLPROPERTIES during Iceberg 
> create table statements.
> The ideal syntax would be:
> CREATE TABLE tbl STORED BY ICEBERG STORED AS ORC ...
> One complication is that currently stored by and stored as are not permitted 
> within the same query, so that needs to be amended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22977) Merge delta files instead of running a query in major/minor compaction

2021-06-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér updated HIVE-22977:
-
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Merge delta files instead of running a query in major/minor compaction
> --
>
> Key: HIVE-22977
> URL: https://issues.apache.org/jira/browse/HIVE-22977
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
> Attachments: HIVE-22977.01.patch, HIVE-22977.02.patch
>
>
> [Compaction Optimiziation]
> We should analyse the possibility to move a delta file instead of running a 
> major/minor compaction query.
> Please consider the following use cases:
>  - full acid table but only insert queries were run. This means that no 
> delete delta directories were created. Is it possible to merge the delta 
> directory contents without running a compaction query?
>  - full acid table, initiating queries through the streaming API. If there 
> are no abort transactions during the streaming, is it possible to merge the 
> delta directory contents without running a compaction query?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25189) Cache the validWriteIdList in query cache before fetching tables from HMS

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25189?focusedWorklogId=605922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605922
 ]

ASF GitHub Bot logged work on HIVE-25189:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 13:42
Start Date: 03/Jun/21 13:42
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2342:
URL: https://github.com/apache/hive/pull/2342#discussion_r644466843



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CacheTableHelper.java
##
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.parse;
+
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.Queue;
+import java.util.Set;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+
+import com.google.common.base.Preconditions;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
+import org.apache.hadoop.hive.ql.lockmgr.HiveTxnManager;
+import org.apache.hadoop.hive.ql.session.SessionState;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Class to help populate the cache at the beginning of query analysis. We 
would like
+ * to minimize the number of calls to fetch validWriteIdLists from the 
metastore. HMS
+ * has an API to request this object for multiple tables within one call, and 
this class
+ * uses that API.
+ *
+ * The sole purpose of this class is to help populate the HMS query cache. 
Nothing is returned
+ * from the public methods. In this way, if another method attempts to fetch a 
validWriteIdList,
+ * the SessionHiveMetaStoreClient query cache will contain the information.
+ *
+ * Because this class is only responsible for cache population, it is not a 
requirement for
+ * a caller to supply all the tables necessary for the query. It is also not a 
requirement
+ * for the tables to be part of the query. Of course, the query qill benefit 
if those
+ * conditions were true, but if the table is not in the cache, a later call 
fetching the writeids
+ * will hit the HMS server and will not fail.
+ *
+ * One tricky aspect to this class is that if a view is passed in, we want to 
fetch the
+ * validWriteIdLists for the underlying tables. At the beginning of the query, 
it is impossible
+ * to know the underlying tables without contacting HMS.
+ *
+ * In order to handle the underlying tables to the views, we keep a cache that 
holds our
+ * best guess. If we see a view in any query, we set up a server-wide cache 
that tracks
+ * the underlying tables to the view. If the view doesn't change, then this 
information
+ * will be accurate and allow us to fetch the underlying tables on our next 
query. If the
+ * view does change and the underlying tables are different, our fetch won't 
retrieve the
+ * correct information. But that's ok...remember what was said earlier that it 
is not
+ * a requirement for the tables to be part of the query. Later on in the 
query, this class
+ * will be called on the view level via the populateCacheForView call. At that 
point, if
+ * something changed, it will populate the cache with the newly detected 
tables. It will also
+ * change the underying table information for the view to optimize the next 
query using
+ * the view.
+ */
+public class CacheTableHelper {
+  protected static final Logger LOG = 
LoggerFactory.getLogger(CacheTableHelper.class);
+
+  // Server wide cache used to hold what we currently think are the underlying 
tables
+  // for a view (which is the key). This information can go stale, but that's 
ok. The only
+  // repercussion of a stale view is that we will have to make an additional 
HMS call
+  // to retrieve the validWriteIdList for the changed tables.
+  // Will hold 

[jira] [Work logged] (HIVE-25165) Generate & track statistics per event type for incremental load in replication metrics

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25165?focusedWorklogId=605851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605851
 ]

ASF GitHub Bot logged work on HIVE-25165:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 11:30
Start Date: 03/Jun/21 11:30
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2321:
URL: https://github.com/apache/hive/pull/2321#discussion_r644714176



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.commons.collections4.map.ListOrderedMap;
+import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tracks the replication statistics per event type.
+ */
+public class ReplStatsTracker {
+
+  // Maintains the descriptive statistics per event type.
+  private HashMap descMap;
+
+  // Maintains the top K costliest eventId's
+  private HashMap> topKEvents;
+  // Number of top events to maintain.
+  private final int k;
+
+  public ReplStatsTracker(int k) {
+this.k = k;
+descMap = new HashMap<>();
+topKEvents = new HashMap<>();
+  }
+
+  /**
+   * Adds an entry for tracking.
+   * @param eventType the type of event.
+   * @param eventId the event id.
+   * @param timeTaken time taken to process the event.
+   */
+  public void addEntry(String eventType, String eventId, long timeTaken) {
+// Update the entry in the descriptive statistics.
+DescriptiveStatistics descStatistics = descMap.get(eventType);
+if (descStatistics == null) {
+  descStatistics = new DescriptiveStatistics();
+  descStatistics.addValue(timeTaken);
+  descMap.put(eventType, descStatistics);
+} else {
+  descStatistics.addValue(timeTaken);
+}
+
+// Tracking for top K events, Maintain the list in descending order.
+ListOrderedMap topKEntries = topKEvents.get(eventType);
+if (topKEntries == null) {
+  topKEntries = new ListOrderedMap<>();
+  topKEntries.put(Long.parseLong(eventId), timeTaken);
+  topKEvents.put(eventType, topKEntries);
+} else {
+  // Get the index of insertion, by descending order.
+  int index = Collections.binarySearch(new 
ArrayList(topKEntries.values()), timeTaken, Collections.reverseOrder());

Review comment:
   where are you sorting this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605851)
Time Spent: 0.5h  (was: 20m)

> Generate & track statistics per event type for incremental load in 
> replication metrics
> --
>
> Key: HIVE-25165
> URL: https://issues.apache.org/jira/browse/HIVE-25165
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Generate and track statistics like mean, median. standard deviation, variance 
> etc per event type during incremental load and store them in replication 
> statistics 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25193) Vectorized Query Execution: ClassCastException when use nvl() function which default_value is decimal type

2021-06-03 Thread qiang.bi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiang.bi updated HIVE-25193:

Description: 
Problem statement:
{code:java}
set hive.vectorized.execution.enabled = true;
select nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) corrected_price,
from dw_mdm_sync_asset;
{code}
 The error log:
{code:java}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVectorCaused by: 
java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector cannot be cast to 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:504)
 at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorCoalesce.evaluate(VectorCoalesce.java:124)
 at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
 at 
org.apache.hadoop.hive.ql.exec.vector.expressions.CastStringToDouble.evaluate(CastStringToDouble.java:83)
 at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
 ... 28 more{code}
 The problem HiveQL:
{code:java}
nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) corrected_price
{code}
 The problem expression:
{code:java}
CastStringToDouble(col 39:string)(children: VectorCoalesce(columns [37, 
38])(children: VectorUDFAdaptor(get_json_object(_col14, '$.correctedPrice')) -> 
37:string, ConstantVectorExpression(val 0.88) -> 38:decimal(2,2)) -> 39:string) 
-> 40:double
{code}
 The problem code:
{code:java}
public class VectorCoalesce extends VectorExpression {  
  ...   
  @Override
  public void evaluate(VectorizedRowBatch batch) throws HiveException {if 
(childExpressions != null) {
  super.evaluateChildren(batch);
}int[] sel = batch.selected;
int n = batch.size;
ColumnVector outputColVector = batch.cols[outputColumnNum];
boolean[] outputIsNull = outputColVector.isNull;
if (n <= 0) {
  // Nothing to do
  return;
}if (unassignedBatchIndices == null || n > 
unassignedBatchIndices.length) {  // (Re)allocate larger to be a multiple 
of 1024 (DEFAULT_SIZE).
  final int roundUpSize =
  ((n + VectorizedRowBatch.DEFAULT_SIZE - 1) / 
VectorizedRowBatch.DEFAULT_SIZE)
  * VectorizedRowBatch.DEFAULT_SIZE;
  unassignedBatchIndices = new int[roundUpSize];
}// We do not need to do a column reset since we are carefully changing 
the output.
outputColVector.isRepeating = false;// CONSIDER: Should be do this for 
all vector expressions that can
//   work on BytesColumnVector output columns???
outputColVector.init();
final int columnCount = inputColumns.length;/*
 * Process the input columns to find a non-NULL value for each row.
 *
 * We track the unassigned batchIndex of the rows that have not received
 * a non-NULL value yet.  Similar to a selected array.
 */
boolean isAllUnassigned = true;
int unassignedColumnCount = 0;
for (int k = 0; k < inputColumns.length; k++) {
  ColumnVector cv = batch.cols[inputColumns[k]];
  if (cv.isRepeating) {if (cv.noNulls || !cv.isNull[0]) {
  /*
   * With a repeating value we can finish all remaining rows.
   */
  if (isAllUnassigned) {// No other columns provided 
non-NULL values.  We can return repeated output.
outputIsNull[0] = false;
outputColVector.setElement(0, 0, cv);
outputColVector.isRepeating = true;
return;
  } else {// Some rows have already been assigned values. 
Assign the remaining.
// We cannot use copySelected method here.
for (int i = 0; i < unassignedColumnCount; i++) {
  final int batchIndex = unassignedBatchIndices[i];
  outputIsNull[batchIndex] = false;  // Our input is 
repeating (i.e. inputColNumber = 0).
  outputColVector.setElement(batchIndex, 0, cv);
}
return;
  }
} else {  // Repeated NULLs -- skip this input column.
}
  } else {/*
 * Non-repeating input column. Use any non-NULL values for unassigned 
rows.
 */
if (isAllUnassigned) {  /*
   * No other columns provided non-NULL values.  We *may* be able to 
finish all rows
   * with this input column...
   */
  if (cv.noNulls){// Since no NULLs, we can provide values 
for all rows.
if (batch.selectedInUse) {
  for (int i = 0; i < n; i++) {
final int batchIndex = sel[i];
outputIsNull[batchIndex] = false;

[jira] [Assigned] (HIVE-25193) Vectorized Query Execution: ClassCastException when use nvl() function which default_value is decimal type

2021-06-03 Thread qiang.bi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qiang.bi reassigned HIVE-25193:
---


> Vectorized Query Execution: ClassCastException when use nvl() function which 
> default_value is decimal type
> --
>
> Key: HIVE-25193
> URL: https://issues.apache.org/jira/browse/HIVE-25193
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: qiang.bi
>Assignee: qiang.bi
>Priority: Major
>
> Problem statement:
>  
> {code:java}
> set hive.vectorized.execution.enabled = true;
> select nvl(get_json_object(attr_json,'$.correctedPrice'),0.88) 
> corrected_price,
> from dw_mdm_sync_asset;
> {code}
>  
>  
> The error log:
>  
> {code:java}
> [2021-05-24 08:06:05.627]], TaskAttempt 3 failed, info=[Error: Error while 
> running task ( failure ) : 
> attempt_1619882873092_4567_1_03_00_3:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing 
> operators[2021-05-24 08:06:05.627]], TaskAttempt 3 failed, info=[Error: Error 
> while running task ( failure ) : 
> attempt_1619882873092_4567_1_03_00_3:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing 
> operators[2021-05-24 08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)[2021-05-24
>  08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)[2021-05-24
>  08:06:05.628] at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)[2021-05-24
>  08:06:05.628] at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)[2021-05-24
>  08:06:05.628] at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)[2021-05-24
>  08:06:05.628] at java.security.AccessController.doPrivileged(Native 
> Method)[2021-05-24 08:06:05.628] at 
> javax.security.auth.Subject.doAs(Subject.java:422)[2021-05-24 08:06:05.628] 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)[2021-05-24
>  08:06:05.628] at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)[2021-05-24
>  08:06:05.628] at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)[2021-05-24
>  08:06:05.628] at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)[2021-05-24
>  08:06:05.628] at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)[2021-05-24
>  08:06:05.628] at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)[2021-05-24
>  08:06:05.628] at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)[2021-05-24
>  08:06:05.628] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[2021-05-24
>  08:06:05.628] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)[2021-05-24
>  08:06:05.628] at java.lang.Thread.run(Thread.java:748)[2021-05-24 
> 08:06:05.628]Caused by: java.lang.RuntimeException: Hive Runtime Error while 
> closing operators[2021-05-24 08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:495)[2021-05-24
>  08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)[2021-05-24
>  08:06:05.628] ... 16 more[2021-05-24 08:06:05.628]Caused by: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
> null[2021-05-24 08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:621)[2021-05-24
>  08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.closeOp(VectorMapJoinGenerateResultOperator.java:681)[2021-05-24
>  08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)[2021-05-24 
> 08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)[2021-05-24 
> 08:06:05.628] at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:484)[2021-05-24
>  08:06:05.628] ... 17 more[2021-05-24 08:06:05.628]Caused by: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error 

[jira] [Work logged] (HIVE-25165) Generate & track statistics per event type for incremental load in replication metrics

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25165?focusedWorklogId=605805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605805
 ]

ASF GitHub Bot logged work on HIVE-25165:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 10:01
Start Date: 03/Jun/21 10:01
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2321:
URL: https://github.com/apache/hive/pull/2321#discussion_r643827420



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/ReplLogger.java
##
@@ -47,4 +48,8 @@ public void dataCopyLog(String message) {
 
   public void setParams(String dbName, String dumpDirectory, long numTables, 
long numFunctions) {
   }
+
+  public ReplStatsTracker getReplStatsTracker() {

Review comment:
   mark this as abstract

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplStatsTracker.java
##
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.commons.collections4.map.ListOrderedMap;
+import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Tracks the replication statistics per event type.
+ */
+public class ReplStatsTracker {
+
+  // Maintains the descriptive statistics per event type.
+  private HashMap descMap;
+
+  // Maintains the top K costliest eventId's
+  private HashMap> topKEvents;

Review comment:
   this is not thread safe. check if it can have an impact

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/event/Stage.java
##
@@ -35,6 +35,7 @@
   private Map metrics = new HashMap<>();
   private String errorLogPath;
   private SnapshotUtils.ReplSnapshotCount replSnapshotCount = new 
SnapshotUtils.ReplSnapshotCount();
+  private String replStats = "";

Review comment:
   set to null instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605805)
Time Spent: 20m  (was: 10m)

> Generate & track statistics per event type for incremental load in 
> replication metrics
> --
>
> Key: HIVE-25165
> URL: https://issues.apache.org/jira/browse/HIVE-25165
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Generate and track statistics like mean, median. standard deviation, variance 
> etc per event type during incremental load and store them in replication 
> statistics 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24749) Disable user's UDF use SystemExit

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24749?focusedWorklogId=605771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605771
 ]

ASF GitHub Bot logged work on HIVE-24749:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 08:36
Start Date: 03/Jun/21 08:36
Worklog Time Spent: 10m 
  Work Description: StefanXiepj closed pull request #1955:
URL: https://github.com/apache/hive/pull/1955


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605771)
Time Spent: 20m  (was: 10m)

> Disable user's UDF use SystemExit
> -
>
> Key: HIVE-24749
> URL: https://issues.apache.org/jira/browse/HIVE-24749
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: All Versions
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If the System.exit() is executed in the user's UDF and using default 
> SecurityManager, it will cause the HS2 service process to exit, that's too 
> bad.
> It is safer to use NoExitSecurityManager which can intercepting System.exit().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25192) No need to create table directory for the non-native table

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25192?focusedWorklogId=605765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605765
 ]

ASF GitHub Bot logged work on HIVE-25192:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 08:25
Start Date: 03/Jun/21 08:25
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2346:
URL: https://github.com/apache/hive/pull/2346


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605765)
Remaining Estimate: 0h
Time Spent: 10m

> No need to create table directory for the non-native table
> --
>
> Key: HIVE-25192
> URL: https://issues.apache.org/jira/browse/HIVE-25192
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When creating non-native tables like kudu, hbase and so on,  we always create 
> a warehouse location for these tables, though these tables may not use the 
> location to store data or for job plan, so there is no need to create such 
> location. 
> We also should skip getting the input summary of non-native tables in some 
> cases, this will avoid oom problem of building the hash table when the 
> non-native table is on build side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25192) No need to create table directory for the non-native table

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25192:
--
Labels: pull-request-available  (was: )

> No need to create table directory for the non-native table
> --
>
> Key: HIVE-25192
> URL: https://issues.apache.org/jira/browse/HIVE-25192
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When creating non-native tables like kudu, hbase and so on,  we always create 
> a warehouse location for these tables, though these tables may not use the 
> location to store data or for job plan, so there is no need to create such 
> location. 
> We also should skip getting the input summary of non-native tables in some 
> cases, this will avoid oom problem of building the hash table when the 
> non-native table is on build side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25192) No need to create table directory for the non-native table

2021-06-03 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-25192:
--

Assignee: Zhihua Deng

> No need to create table directory for the non-native table
> --
>
> Key: HIVE-25192
> URL: https://issues.apache.org/jira/browse/HIVE-25192
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>
> When creating non-native tables like kudu, hbase and so on,  we always create 
> a warehouse location for these tables, though these tables may not use the 
> location to store data or for job plan, so there is no need to create such 
> location. 
> We also should skip getting the input summary of non-native tables in some 
> cases, this will avoid oom problem of building the hash table when the 
> non-native table is on build side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=605713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605713
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 07:15
Start Date: 03/Jun/21 07:15
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644544768



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -625,6 +633,11 @@ public boolean runOneWorkerIteration(
 }
 String cmd = null;
 try {
+  TableName tb = req.tableName;

Review comment:
   add a test where db being failed over is picked up first and later the 
other db is picked up 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605713)
Time Spent: 3h 40m  (was: 3.5h)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=605710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605710
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 07:13
Start Date: 03/Jun/21 07:13
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644543669



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##
@@ -133,11 +135,21 @@ public void run() {
 LOG.info("Looking for tables using catalog: {} dbPattern: {} 
tablePattern: {} found: {}", catalogName,
   dbPattern, tablePattern, foundTableMetas.size());
 
+Map databasesToSkip = new HashMap<>();
+
 for (TableMeta tableMeta : foundTableMetas) {
   try {
+String dbName = 
MetaStoreUtils.prependCatalogToDbName(tableMeta.getCatName(), 
tableMeta.getDbName(), conf);
+if (!databasesToSkip.containsKey(dbName)) {
+  Database db = msc.getDatabase(tableMeta.getCatName(), 
tableMeta.getDbName());
+  databasesToSkip.put(dbName, isTargetOfReplication(db) || 
MetaStoreUtils.isDbBeingFailedOver(db));

Review comment:
   Add a INFO level log for DB that with why it is getting skipped...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605710)
Time Spent: 3.5h  (was: 3h 20m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=605709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605709
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 07:12
Start Date: 03/Jun/21 07:12
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644543165



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
##
@@ -133,11 +135,21 @@ public void run() {
 LOG.info("Looking for tables using catalog: {} dbPattern: {} 
tablePattern: {} found: {}", catalogName,
   dbPattern, tablePattern, foundTableMetas.size());
 
+Map databasesToSkip = new HashMap<>();
+
 for (TableMeta tableMeta : foundTableMetas) {
   try {
+String dbName = 
MetaStoreUtils.prependCatalogToDbName(tableMeta.getCatName(), 
tableMeta.getDbName(), conf);
+if (!databasesToSkip.containsKey(dbName)) {
+  Database db = msc.getDatabase(tableMeta.getCatName(), 
tableMeta.getDbName());
+  databasesToSkip.put(dbName, isTargetOfReplication(db) || 
MetaStoreUtils.isDbBeingFailedOver(db));
+}
+if (databasesToSkip.get(dbName)) {
+  LOG.info("Skipping table : {}", tableMeta.getTableName());

Review comment:
   use debug.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605709)
Time Spent: 3h 20m  (was: 3h 10m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=605707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605707
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 07:08
Start Date: 03/Jun/21 07:08
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644540793



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
##
@@ -229,6 +231,15 @@ public static boolean isExternalTable(Table table) {
 return isExternal(params);
   }
 
+  public static boolean isDbBeingFailedOver(Database db) {
+assert (db != null);
+Map dbParameters = db.getParameters();
+if ((dbParameters != null) && 
(dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) {
+  return 
ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));
+}
+return false;

Review comment:
   also, ReplConst.TRUE.equals : do we need to handle case sensitiveness? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605707)
Time Spent: 3h 10m  (was: 3h)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=605705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605705
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 07:07
Start Date: 03/Jun/21 07:07
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644540036



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
##
@@ -229,6 +231,15 @@ public static boolean isExternalTable(Table table) {
 return isExternal(params);
   }
 
+  public static boolean isDbBeingFailedOver(Database db) {
+assert (db != null);
+Map dbParameters = db.getParameters();
+if ((dbParameters != null) && 
(dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) {
+  return 
ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));
+}
+return false;

Review comment:
   Does this single line suffice?
   return dbParameters != null && 
ReplConst.TRUE.equals(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED));




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605705)
Time Spent: 3h  (was: 2h 50m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=605698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605698
 ]

ASF GitHub Bot logged work on HIVE-25154:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 06:51
Start Date: 03/Jun/21 06:51
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #2311:
URL: https://github.com/apache/hive/pull/2311#discussion_r644531296



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
##
@@ -625,6 +633,11 @@ public boolean runOneWorkerIteration(
 }
 String cmd = null;
 try {
+  TableName tb = req.tableName;

Review comment:
   If the very first table belongs to DbBeingFailover, it will break the 
logic for "doWait"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605698)
Time Spent: 2h 50m  (was: 2h 40m)

> Disable StatsUpdaterThread and PartitionManagementTask for db that is being 
> failoved over.
> --
>
> Key: HIVE-25154
> URL: https://issues.apache.org/jira/browse/HIVE-25154
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25180) Update netty to 4.1.60.Final

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25180?focusedWorklogId=605691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605691
 ]

ASF GitHub Bot logged work on HIVE-25180:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 06:37
Start Date: 03/Jun/21 06:37
Worklog Time Spent: 10m 
  Work Description: csjuhasz-c opened a new pull request #2345:
URL: https://github.com/apache/hive/pull/2345


   
   
   ### What changes were proposed in this pull request?
   
   isNull field should be set on vectors nested in StructColumnVector 
instances, so they are handled by Hive instead  of netty.
   
   ### Why are the changes needed?
   
   netty 4.1.60.Final introduced a [null 
check](https://github.com/netty/netty/commit/c717d4b97aa164a500098c6fb7b834535d13bf53#diff-49ee8d7612d5ecfcc27b46c38a801ad32ebdb169f7d79f1577313a1de70b0fbbR537),
 breaking arrow serialization.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   No new tests were added, the netty update broke some existing ones.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605691)
Time Spent: 20m  (was: 10m)

> Update netty to 4.1.60.Final
> 
>
> Key: HIVE-25180
> URL: https://issues.apache.org/jira/browse/HIVE-25180
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=605682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605682
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 03/Jun/21 06:16
Start Date: 03/Jun/21 06:16
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2344:
URL: https://github.com/apache/hive/pull/2344


   …properly
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605682)
Time Spent: 4.5h  (was: 4h 20m)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)