[jira] [Updated] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

2021-05-26 Thread Wei Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-25170:
-
Description: 
 
{code:java}

SET hive.remove.orderby.in.subquery=false;

EXPLAIN
SELECT constant_col, key, max(value)
FROM
(
  SELECT 'constant' as constant_col, key, value
  FROM src
  DISTRIBUTE BY constant_col, key
  SORT BY constant_col, key, value
) a
GROUP BY constant_col, key
LIMIT 10;

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
  Fetch Operator
limit:10
Stage-1
  Reducer 3
  File Output Operator [FS_10]
Limit [LIM_9] (rows=1 width=368)
  Number of rows:10
  Select Operator [SEL_8] (rows=1 width=368)
Output:["_col0","_col1","_col2"]
Group By Operator [GBY_7] (rows=1 width=368)
  
Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant',
 'constant'
<-Reducer 2 [SIMPLE_EDGE]
  SHUFFLE [RS_6]
PartitionCols:'constant', 'constant'
Group By Operator [GBY_5] (rows=1 width=368)
  
Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 
'constant'
  Select Operator [SEL_3] (rows=500 width=178)
Output:["_col2"]
  <-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_2]
  PartitionCols:'constant', _col1
  Select Operator [SEL_1] (rows=500 width=178)
Output:["_col1","_col2"]
TableScan [TS_0] (rows=500 width=10)
  
src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 
'constant', it should be 'constant', _col1

 

That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate 
the colExprMap structure in the key part, while the key columns are generated 
by newSortCols, leading to a column and expr mismatch when the constant column 
is not the trailing column in the key columns.

Constant propagation optimizer uses this colExprMap and finds extra const 
expression in the mismatched map, resulting in this error.

 

In fact, colExprMap is used by multiple optimizers, which makes this quite a 
serious problem.

  was:
 
{code:java}
// code placeholder

EXPLAIN
SELECT constant_col, key, max(value)
FROM
(
  SELECT 'constant' as constant_col, key, value
  FROM src
  DISTRIBUTE BY constant_col, key
  SORT BY constant_col, key, value
) a
GROUP BY constant_col, key
LIMIT 10;

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
  Fetch Operator
limit:10
Stage-1
  Reducer 3
  File Output Operator [FS_10]
Limit [LIM_9] (rows=1 width=368)
  Number of rows:10
  Select Operator [SEL_8] (rows=1 width=368)
Output:["_col0","_col1","_col2"]
Group By Operator [GBY_7] (rows=1 width=368)
  
Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant',
 'constant'
<-Reducer 2 [SIMPLE_EDGE]
  SHUFFLE [RS_6]
PartitionCols:'constant', 'constant'
Group By Operator [GBY_5] (rows=1 width=368)
  
Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 
'constant'
  Select Operator [SEL_3] (rows=500 width=178)
Output:["_col2"]
  <-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_2]
  PartitionCols:'constant', _col1
  Select Operator [SEL_1] (rows=500 width=178)
Output:["_col1","_col2"]
TableScan [TS_0] (rows=500 width=10)
  
src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 
'constant', it should be 'constant', _col1

 

That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate 
the colExprMap structure in the key part, while the key columns are generated 
by newSortCols, leading to a column and expr mismatch when the constant column 
is not the trailing column in the key columns.

Constant propagation optimizer uses this colExprMap and finds extra const 
expression in the mismatched map, resulting in this error.

 

In fact, colExprMap is used by multiple optimizers, which makes this quite a 
serious problem.


> Data error in constant propagation caused by wrong colExprMap generated in 
> SemanticAnalyzer
> ---
>
> Key: HIVE-25170
> URL: https://issues.apache.org/jira/browse/HIVE-25170
>  

[jira] [Updated] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

2021-05-26 Thread Wei Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang updated HIVE-25170:
-
Description: 
 
{code:java}
// code placeholder

EXPLAIN
SELECT constant_col, key, max(value)
FROM
(
  SELECT 'constant' as constant_col, key, value
  FROM src
  DISTRIBUTE BY constant_col, key
  SORT BY constant_col, key, value
) a
GROUP BY constant_col, key
LIMIT 10;

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
  Fetch Operator
limit:10
Stage-1
  Reducer 3
  File Output Operator [FS_10]
Limit [LIM_9] (rows=1 width=368)
  Number of rows:10
  Select Operator [SEL_8] (rows=1 width=368)
Output:["_col0","_col1","_col2"]
Group By Operator [GBY_7] (rows=1 width=368)
  
Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant',
 'constant'
<-Reducer 2 [SIMPLE_EDGE]
  SHUFFLE [RS_6]
PartitionCols:'constant', 'constant'
Group By Operator [GBY_5] (rows=1 width=368)
  
Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 
'constant'
  Select Operator [SEL_3] (rows=500 width=178)
Output:["_col2"]
  <-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_2]
  PartitionCols:'constant', _col1
  Select Operator [SEL_1] (rows=500 width=178)
Output:["_col1","_col2"]
TableScan [TS_0] (rows=500 width=10)
  
src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 
'constant', it should be 'constant', _col1

 

That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate 
the colExprMap structure in the key part, while the key columns are generated 
by newSortCols, leading to a column and expr mismatch when the constant column 
is not the trailing column in the key columns.

Constant propagation optimizer uses this colExprMap and finds extra const 
expression in the mismatched map, resulting in this error.

 

In fact, colExprMap is used by multiple optimizers, which makes this quite a 
serious problem.

  was:
 
{code:java}
// code placeholder

EXPLAIN
SELECT constant_col, key, max(value)
FROM
(
  SELECT 'constant' as constant_col, key, value
  FROM src
  DISTRIBUTE BY constant_col, key
  SORT BY constant_col, key, value
) a
GROUP BY constant_col, key
LIMIT 10;

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
  Fetch Operator
limit:10
Stage-1
  Reducer 3
  File Output Operator [FS_10]
Limit [LIM_9] (rows=1 width=368)
  Number of rows:10
  Select Operator [SEL_8] (rows=1 width=368)
Output:["_col0","_col1","_col2"]
Group By Operator [GBY_7] (rows=1 width=368)
  
Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant',
 'constant'
<-Reducer 2 [SIMPLE_EDGE]
  SHUFFLE [RS_6]
PartitionCols:'constant', 'constant'
Group By Operator [GBY_5] (rows=1 width=368)
  
Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 
'constant'
  Select Operator [SEL_3] (rows=500 width=178)
Output:["_col2"]
  <-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_2]
  PartitionCols:'constant', _col1
  Select Operator [SEL_1] (rows=500 width=178)
Output:["_col1","_col2"]
TableScan [TS_0] (rows=500 width=10)
  
src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
Obviously, the `PartitionCols` in Reducer 2 is wrong. Instead of `'constant', 
'constant'`, it should be `'constant', _col1`

 

That's because after HIVE-13808,  `SemanticAnalyzer` uses `sortCols` to 
generate the `colExprMap` structure in the key part, while the key columns are 
generated by `newSortCols`, leading to a column and expr mismatch when the 
constant column is not the trailing column in the key columns.

 


> Data error in constant propagation caused by wrong colExprMap generated in 
> SemanticAnalyzer
> ---
>
> Key: HIVE-25170
> URL: https://issues.apache.org/jira/browse/HIVE-25170
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.2
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
>
>  
> {code:java}
> // code 

[jira] [Commented] (HIVE-24614) using coalesce via vector,when date type of column are different between source and target,the result of target is zero

2021-05-26 Thread junnan.yang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352265#comment-17352265
 ] 

junnan.yang commented on HIVE-24614:


# This is similar to HIVE-25169 , may be the patch in HIVE-25169 can resolve 
your problem

> using coalesce via vector,when date type of column are different between 
> source and target,the result of target is zero
> ---
>
> Key: HIVE-24614
> URL: https://issues.apache.org/jira/browse/HIVE-24614
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.2
>Reporter: taoyuyin
>Priority: Major
>
> set hive.vectorized.execution.enabled=true;
>  
> CREATE TABLE `tmp.tmp_test_vectorization_source`( 
>  `rn` string,
>  `val_1` int,
>  `val_2` bigint)
>  stored as parquet;
>  
> insert into table `tmp.tmp_test_vectorization_source` 
> values('line1',1000,10001),('line1',2000,20001);
>  
> select rn,val_1,val_2 from tmp.tmp_test_vectorization_source t;
>  
> +-+---++
> |rn|val_1|val_2|
> +-+---++
> |line1|1000|10001|
> |line1|2000|20001|
> +-+---++
>  
> CREATE TABLE `tmp.tmp_test_vectorization_target`( 
>  `rn` string,
>  `val_1` bigint,
>  `val_2` int)
>  stored as parquet;
>  
> insert into table tmp.tmp_test_vectorization_target
>  select
>  rn,
>  coalesce(val_1,0),
>  coalesce(val_2,0)
>  from tmp.tmp_test_vectorization_source;
>  
> select rn,val_1,val_2 from tmp.tmp_test_vectorization_target t;
>  
> +-+---++
> |rn|val_1|val_2|
> +-+---++
> |line1|0|0|
> |line1|0|0|
> +-+---++



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

2021-05-26 Thread Wei Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zhang reassigned HIVE-25170:



> Data error in constant propagation caused by wrong colExprMap generated in 
> SemanticAnalyzer
> ---
>
> Key: HIVE-25170
> URL: https://issues.apache.org/jira/browse/HIVE-25170
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.2
>Reporter: Wei Zhang
>Assignee: Wei Zhang
>Priority: Major
>
>  
> {code:java}
> // code placeholder
> EXPLAIN
> SELECT constant_col, key, max(value)
> FROM
> (
>   SELECT 'constant' as constant_col, key, value
>   FROM src
>   DISTRIBUTE BY constant_col, key
>   SORT BY constant_col, key, value
> ) a
> GROUP BY constant_col, key
> LIMIT 10;
> OK
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
>   Fetch Operator
> limit:10
> Stage-1
>   Reducer 3
>   File Output Operator [FS_10]
> Limit [LIM_9] (rows=1 width=368)
>   Number of rows:10
>   Select Operator [SEL_8] (rows=1 width=368)
> Output:["_col0","_col1","_col2"]
> Group By Operator [GBY_7] (rows=1 width=368)
>   
> Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant',
>  'constant'
> <-Reducer 2 [SIMPLE_EDGE]
>   SHUFFLE [RS_6]
> PartitionCols:'constant', 'constant'
> Group By Operator [GBY_5] (rows=1 width=368)
>   
> Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 
> 'constant'
>   Select Operator [SEL_3] (rows=500 width=178)
> Output:["_col2"]
>   <-Map 1 [SIMPLE_EDGE]
> SHUFFLE [RS_2]
>   PartitionCols:'constant', _col1
>   Select Operator [SEL_1] (rows=500 width=178)
> Output:["_col1","_col2"]
> TableScan [TS_0] (rows=500 width=10)
>   
> src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
> Obviously, the `PartitionCols` in Reducer 2 is wrong. Instead of `'constant', 
> 'constant'`, it should be `'constant', _col1`
>  
> That's because after HIVE-13808,  `SemanticAnalyzer` uses `sortCols` to 
> generate the `colExprMap` structure in the key part, while the key columns 
> are generated by `newSortCols`, leading to a column and expr mismatch when 
> the constant column is not the trailing column in the key columns.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25169) using coalesce via vector,source column type is int and target column type is bigint,the result of target is zero

2021-05-26 Thread junnan.yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

junnan.yang updated HIVE-25169:
---
Description: 
sourceTable:

    product_id int;

###

targetTable:

    product_id bigint;

##

sql: 

    insert overwrite table targetTable:
    select 

    ..

     coalesce(product_id,-1),

    ..

    from sourceTable;

##

explain sql :

     UDFToLong(COALESCE(product_id,-1)) (type: bigint)

##

result :

     the column product_id in targetTable is zero, this is wrong result

 

 

  was:
source table:

    product_id int;

###

target table:

    product_id bigint;

##

sql: 

     coalesce(product_id,-1)

##

explain sql :

 

##

     

 

 


> using coalesce via vector,source column type is int and target column type is 
> bigint,the result of target is zero
> -
>
> Key: HIVE-25169
> URL: https://issues.apache.org/jira/browse/HIVE-25169
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.2
>Reporter: junnan.yang
>Priority: Major
> Attachments: HIVE-25169.01.patch
>
>
> sourceTable:
>     product_id int;
> ###
> targetTable:
>     product_id bigint;
> ##
> sql: 
>     insert overwrite table targetTable:
>     select 
>     ..
>      coalesce(product_id,-1),
>     ..
>     from sourceTable;
> ##
> explain sql :
>      UDFToLong(COALESCE(product_id,-1)) (type: bigint)
> ##
> result :
>      the column product_id in targetTable is zero, this is wrong result
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25169) using coalesce via vector,source column type is int and target column type is bigint,the result of target is zero

2021-05-26 Thread junnan.yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

junnan.yang updated HIVE-25169:
---
Description: 
source table:

    product_id int;

###

target table:

    product_id bigint;

##

sql: 

     coalesce(product_id,-1)

##

explain sql :

 

##

     

 

 

> using coalesce via vector,source column type is int and target column type is 
> bigint,the result of target is zero
> -
>
> Key: HIVE-25169
> URL: https://issues.apache.org/jira/browse/HIVE-25169
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.2
>Reporter: junnan.yang
>Priority: Major
> Attachments: HIVE-25169.01.patch
>
>
> source table:
>     product_id int;
> ###
> target table:
>     product_id bigint;
> ##
> sql: 
>      coalesce(product_id,-1)
> ##
> explain sql :
>  
> ##
>      
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25169) using coalesce via vector,source column type is int and target column type is bigint,the result of target is zero

2021-05-26 Thread junnan.yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

junnan.yang updated HIVE-25169:
---
Attachment: HIVE-25169.01.patch

> using coalesce via vector,source column type is int and target column type is 
> bigint,the result of target is zero
> -
>
> Key: HIVE-25169
> URL: https://issues.apache.org/jira/browse/HIVE-25169
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.1.2
>Reporter: junnan.yang
>Priority: Major
> Attachments: HIVE-25169.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602709
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 27/May/21 02:36
Start Date: 27/May/21 02:36
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2312:
URL: https://github.com/apache/hive/pull/2312#issuecomment-849271893


   @klcopp @miklosgergely If you have time. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602709)
Time Spent: 3h 10m  (was: 3h)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24903) Change String.getBytes() to DFSUtil.string2Bytes(String) to avoid Unsupported Encoding Exception

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24903?focusedWorklogId=602689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602689
 ]

ASF GitHub Bot logged work on HIVE-24903:
-

Author: ASF GitHub Bot
Created on: 27/May/21 01:06
Start Date: 27/May/21 01:06
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2127:
URL: https://github.com/apache/hive/pull/2127#issuecomment-849235452


   Please change title of PR and JIRA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602689)
Time Spent: 0.5h  (was: 20m)

> Change String.getBytes() to DFSUtil.string2Bytes(String) to avoid Unsupported 
> Encoding Exception
> 
>
> Key: HIVE-24903
> URL: https://issues.apache.org/jira/browse/HIVE-24903
> Project: Hive
>  Issue Type: Bug
>Reporter: dbgp2021
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello,
> I found that DFSUtil.string2Bytes(String) can be used here instead of 
> String.getBytes(). Otherwise, the API String.getBytes() may cause potential 
> risk of UnsupportedEncodingException since the behavior of this method when 
> the string cannot be encoded in the default charset is unspecified. One 
> recommended API is DFSUtil.string2Bytes(String) which provides more control 
> over the encoding process and can avoid this exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25168) Add mutable validWriteIdList

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25168:
--
Labels: pull-request-available  (was: )

> Add mutable validWriteIdList
> 
>
> Key: HIVE-25168
> URL: https://issues.apache.org/jira/browse/HIVE-25168
> Project: Hive
>  Issue Type: New Feature
>  Components: storage-api
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Although the current implementation for validWriteIdList is not strictly 
> immutable, it is in some sense to provide a read-only view snapshot. This 
> change is to add another class to provide functionalities for manipulating 
> the writeIdList. We could use this to keep writeIdList up-to-date in an 
> external cache layer for event-based metadata refreshing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25168) Add mutable validWriteIdList

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25168?focusedWorklogId=602661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602661
 ]

ASF GitHub Bot logged work on HIVE-25168:
-

Author: ASF GitHub Bot
Created on: 27/May/21 00:07
Start Date: 27/May/21 00:07
Worklog Time Spent: 10m 
  Work Description: hsnusonic opened a new pull request #2324:
URL: https://github.com/apache/hive/pull/2324


   * A new interface and a new class for mutable validWriteIdList
   * A new test for mutable validWriteIdList
   
   
   
   ### What changes were proposed in this pull request?
   
   Add a new mutable validWriteIdList
   
   ### Why are the changes needed?
   
   For external metadata cache layer, it is useful to keep the validWriteIdList 
up-to-date by events.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   A new test added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602661)
Remaining Estimate: 0h
Time Spent: 10m

> Add mutable validWriteIdList
> 
>
> Key: HIVE-25168
> URL: https://issues.apache.org/jira/browse/HIVE-25168
> Project: Hive
>  Issue Type: New Feature
>  Components: storage-api
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Although the current implementation for validWriteIdList is not strictly 
> immutable, it is in some sense to provide a read-only view snapshot. This 
> change is to add another class to provide functionalities for manipulating 
> the writeIdList. We could use this to keep writeIdList up-to-date in an 
> external cache layer for event-based metadata refreshing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25168) Add mutable validWriteIdList

2021-05-26 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai reassigned HIVE-25168:
-


> Add mutable validWriteIdList
> 
>
> Key: HIVE-25168
> URL: https://issues.apache.org/jira/browse/HIVE-25168
> Project: Hive
>  Issue Type: New Feature
>  Components: storage-api
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
> Although the current implementation for validWriteIdList is not strictly 
> immutable, it is in some sense to provide a read-only view snapshot. This 
> change is to add another class to provide functionalities for manipulating 
> the writeIdList. We could use this to keep writeIdList up-to-date in an 
> external cache layer for event-based metadata refreshing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-25167) Invalid table alias or column reference when Cbo is enabled while CTAS from left join

2021-05-26 Thread Pritha Dawn (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351976#comment-17351976
 ] 

Pritha Dawn edited comment on HIVE-25167 at 5/26/21, 7:31 PM:
--

This is Hive on Tez. The stack trace is :

Invalid table alias or column reference '$hdt$_1': (possible column names are: 
column_1, column_4, dt)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12869)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12809)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12777)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12755)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperatorChildren(SemanticAnalyzer.java:8870)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperator(SemanticAnalyzer.java:9126)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinPlan(SemanticAnalyzer.java:9331)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11553)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11433)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12236)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:621)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12326)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:361)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1869)
 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1816)
 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1811)
 at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
 at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)
 at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
 at org.apache.hive.service.cli.operation.Operation.run(Operation.java:260)
 at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575)
 at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561)
 at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
 at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:566)
 at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)


was (Author: pritha):
The stack trace is :

Invalid table alias or column reference '$hdt$_1': (possible column names are: 
column_1, column_4, dt)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12869)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12809)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12777)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12755)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperatorChildren(SemanticAnalyzer.java:8870)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperator(SemanticAnalyzer.java:9126)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinPlan(SemanticAnalyzer.java:9331)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11553)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11433)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12236)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:621)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12326)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:361)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1869)
 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1816)
 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1811)
 at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
 at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)
 at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
 at 

[jira] [Work logged] (HIVE-25141) Review Error Level Logging in HMS Module

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25141?focusedWorklogId=602462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602462
 ]

ASF GitHub Bot logged work on HIVE-25141:
-

Author: ASF GitHub Bot
Created on: 26/May/21 18:12
Start Date: 26/May/21 18:12
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2299:
URL: https://github.com/apache/hive/pull/2299#issuecomment-849008605


   @miklosgergely I was a little worried about having to change the timeout to 
get it to pass, but I have played with it quite a bit and it just seems flaky.  
Please let me know if otherwise we are good here.  Thanks!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602462)
Time Spent: 1h  (was: 50m)

> Review Error Level Logging in HMS Module
> 
>
> Key: HIVE-25141
> URL: https://issues.apache.org/jira/browse/HIVE-25141
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Remove "log *and* throw" (it should be one or the other
>  * Remove superfluous code
>  * Ensure the stack traces are being logged (and not just the Exception 
> message) to ease troubleshooting
>  * Remove double-printing the Exception message (SLF4J dictates that the 
> Exception message will be printed as part of the logger's formatting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25167) Invalid table alias or column reference when Cbo is enabled while CTAS from left join

2021-05-26 Thread Pritha Dawn (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351976#comment-17351976
 ] 

Pritha Dawn commented on HIVE-25167:


The stack trace is :

Invalid table alias or column reference '$hdt$_1': (possible column names are: 
column_1, column_4, dt)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12869)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12809)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12777)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12755)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperatorChildren(SemanticAnalyzer.java:8870)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperator(SemanticAnalyzer.java:9126)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinPlan(SemanticAnalyzer.java:9331)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11553)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11433)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12236)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:621)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12326)
 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:361)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1869)
 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1816)
 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1811)
 at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
 at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)
 at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
 at org.apache.hive.service.cli.operation.Operation.run(Operation.java:260)
 at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575)
 at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561)
 at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
 at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:566)
 at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)

> Invalid table alias or column reference when Cbo is enabled while CTAS from 
> left join
> -
>
> Key: HIVE-25167
> URL: https://issues.apache.org/jira/browse/HIVE-25167
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.0
>Reporter: Pritha Dawn
>Priority: Critical
>
> The query fails with a semantic exception when cbo is enabled but succeeds 
> when cbo is turned off.
> Error Message is "Invalid table alias or column reference '$hdt$_1': 
> (possible column names are: column_1, column_4, dt)"
>  
> CREATE TABLE A ( 
>  column_1 int, 
>  column_2 int, 
>  column_3 int, 
>  column_4 int, 
>  dt int);
>  
> Insert into A values (1,2,3,4,5);
> CREATE TABLE B ( 
> column_1 int, 
> column_2 int, 
> dt int);
> Insert into B values (1,2,3);
> CREATE TABLE C ( 
> column_1 int, 
> dt int, 
> column_5 int);
>  
> Insert into C values (1,2,3);
> explain create table test_cbo9 AS
> SELECT
>  ACCT.COLUMN_1
>  , ACCT.DT
>  , ACCT.COLUMN_4
> FROM (
>  SELECT
>  A.COLUMN_1 AS COLUMN_1
>  , A.COLUMN_2 AS COLUMN_2
>  , A.COLUMN_3 AS COLUMN_3
>  , B.COLUMN_2 AS B_COLUMN_2
>  , A.DT AS DT
>  , A.COLUMN_4 AS COLUMN_4
>  FROM A
>  LEFT JOIN B
>  ON A.COLUMN_1 = B.COLUMN_1
> ) AS ACCT
> LEFT JOIN C
> ON ACCT.COLUMN_1 = C.COLUMN_1
> AND ACCT.B_COLUMN_2 = 2
> AND C.DT = 2;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?focusedWorklogId=602456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602456
 ]

ASF GitHub Bot logged work on HIVE-25104:
-

Author: ASF GitHub Bot
Created on: 26/May/21 17:37
Start Date: 26/May/21 17:37
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2282:
URL: https://github.com/apache/hive/pull/2282#discussion_r639979235



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -2197,9 +2197,14 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 
HIVE_PARQUET_DATE_PROLEPTIC_GREGORIAN_DEFAULT("hive.parquet.date.proleptic.gregorian.default",
 false,
   "This value controls whether date type in Parquet files was written 
using the hybrid or proleptic\n" +
   "calendar. Hybrid is the default."),
-
HIVE_PARQUET_TIMESTAMP_LEGACY_CONVERSION_ENABLED("hive.parquet.timestamp.legacy.conversion.enabled",
 true,

Review comment:
   I explored various options (e.g., property validator, alternative 
property name)  but there does not seem to exist a reliable way of throwing an 
exception when the old property is used explicitly by the client.
   
   When property validation is on then we have some [additional 
checks](https://github.com/apache/hive/blob/95ab0c05ab68284355d431e352fe59a3e2dd9d6c/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java#L254)
 but even like that I think there are holes (e.g., property set in 
`hive-site.xml`).
   
   If we want to protect users who might set this property explicitly despite 
the fact that it is clearly noted to be for debugging purposes then I guess we 
need to retain the old property name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602456)
Time Spent: 1h 20m  (was: 1h 10m)

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25141) Review Error Level Logging in HMS Module

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25141?focusedWorklogId=602425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602425
 ]

ASF GitHub Bot logged work on HIVE-25141:
-

Author: ASF GitHub Bot
Created on: 26/May/21 16:40
Start Date: 26/May/21 16:40
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2299:
URL: https://github.com/apache/hive/pull/2299#discussion_r639913863



##
File path: ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java
##
@@ -1692,7 +1692,7 @@ public void testReplOpenTxn() throws Exception {
 int numTxn = 5;
 String[] output = TestTxnDbUtil.queryToString(conf, "SELECT 
MAX(\"TXN_ID\") + 1 FROM \"TXNS\"").split("\n");
 long startTxnId = Long.parseLong(output[1].trim());
-txnHandler.setOpenTxnTimeOutMillis(3);
+txnHandler.setOpenTxnTimeOutMillis(6);

Review comment:
   This timeout (failing test) was giving me a lot of heartburn.  I have 
been playing around with it on `master` branch and it's pretty sensitive to 
change.  Lowering it just a little causes timeout failures.  I have bumped it 
here to get it to pass and to make this test less flaky.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602425)
Time Spent: 50m  (was: 40m)

> Review Error Level Logging in HMS Module
> 
>
> Key: HIVE-25141
> URL: https://issues.apache.org/jira/browse/HIVE-25141
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Remove "log *and* throw" (it should be one or the other
>  * Remove superfluous code
>  * Ensure the stack traces are being logged (and not just the Exception 
> message) to ease troubleshooting
>  * Remove double-printing the Exception message (SLF4J dictates that the 
> Exception message will be printed as part of the logger's formatting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25104) Backward incompatible timestamp serialization in Parquet for certain timezones

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25104?focusedWorklogId=602378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602378
 ]

ASF GitHub Bot logged work on HIVE-25104:
-

Author: ASF GitHub Bot
Created on: 26/May/21 14:49
Start Date: 26/May/21 14:49
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #2282:
URL: https://github.com/apache/hive/pull/2282#issuecomment-848834410


   @klcopp , could you take a look too? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602378)
Time Spent: 1h 10m  (was: 1h)

> Backward incompatible timestamp serialization in Parquet for certain timezones
> --
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro, Orc).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in 
> Parquet files is not backwards compatible. In other words writing timestamps 
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them 
> with another (not including the previous issues) may lead to different 
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-05-26 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24537:
--
Fix Version/s: 4.0.0

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-05-26 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351776#comment-17351776
 ] 

Panagiotis Garefalakis commented on HIVE-24537:
---

Resolved via https://github.com/apache/hive/pull/2224

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24537?focusedWorklogId=602333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602333
 ]

ASF GitHub Bot logged work on HIVE-24537:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:23
Start Date: 26/May/21 13:23
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #2224:
URL: https://github.com/apache/hive/pull/2224#issuecomment-848766238


   Merged to master, thanks @maheshk114  for the review! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602333)
Time Spent: 1h 20m  (was: 1h 10m)

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-05-26 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis resolved HIVE-24537.
---
Resolution: Fixed

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24537?focusedWorklogId=602332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602332
 ]

ASF GitHub Bot logged work on HIVE-24537:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:22
Start Date: 26/May/21 13:22
Worklog Time Spent: 10m 
  Work Description: pgaref merged pull request #2224:
URL: https://github.com/apache/hive/pull/2224


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602332)
Time Spent: 1h 10m  (was: 1h)

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602330
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:17
Start Date: 26/May/21 13:17
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639715427



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -860,15 +804,10 @@ public static void cleanupInstance() {
 }
   }
 
-  private static ScheduledExecutorService invalidationExecutor = null;
-  private static ExecutorService deletionExecutor = null;
-
-  static {
-ThreadFactory threadFactory =
-new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("QueryResultsCache 
%d").build();
-invalidationExecutor = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
-deletionExecutor = Executors.newSingleThreadExecutor(threadFactory);
-  }
+  private static ScheduledExecutorService invalidationExecutor = 
Executors.newSingleThreadScheduledExecutor(
+  new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("QueryCacheInvalidator 
%d").build());
+  private static ExecutorService deletionExecutor = 
Executors.newSingleThreadScheduledExecutor(
+  new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("QueryCacheDeletor 
%d").build());

Review comment:
   Re-factor and give each pool its own name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602330)
Time Spent: 2h 50m  (was: 2h 40m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602331
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:17
Start Date: 26/May/21 13:17
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2312:
URL: https://github.com/apache/hive/pull/2312


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602331)
Time Spent: 3h  (was: 2h 50m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602329
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:16
Start Date: 26/May/21 13:16
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639715093



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -726,35 +709,24 @@ private boolean entryMatches(LookupInfo lookupInfo, 
CacheEntry entry, Set 
tableToEntryMap.remove(tableName, entry));
+

Review comment:
   Moved out from having its own method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602329)
Time Spent: 2h 40m  (was: 2.5h)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602328
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:15
Start Date: 26/May/21 13:15
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639714420



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -601,66 +591,59 @@ public boolean setEntryValid(CacheEntry cacheEntry, 
FetchWork fetchWork) {
   }
 
   public void clear() {
-Lock writeLock = rwLock.writeLock();
+cacheWriteLock.lock();
 try {
-  writeLock.lock();
   LOG.info("Clearing the results cache");
-  CacheEntry[] allEntries = null;
-  synchronized (lru) {
-allEntries = lru.keySet().toArray(EMPTY_CACHEENTRY_ARRAY);
-  }
-  for (CacheEntry entry : allEntries) {

Review comment:
   The `cacheWrite` lock is held here so no need to synchronize on `lru`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602328)
Time Spent: 2.5h  (was: 2h 20m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602327
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:14
Start Date: 26/May/21 13:14
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639713660



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -493,20 +491,17 @@ public CacheEntry addToCache(QueryInfo queryInfo, 
ValidTxnWriteIdList txnWriteId
 addedEntry.queryInfo = queryInfo;
 addedEntry.txnWriteIdList = txnWriteIdList;
 
-Lock writeLock = rwLock.writeLock();
+cacheWriteLock.lock();
 try {
-  writeLock.lock();
-

Review comment:
   Move locks outside the try block as is the official Java Trails way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602327)
Time Spent: 2h 20m  (was: 2h 10m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602326
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:14
Start Date: 26/May/21 13:14
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639713247



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -416,58 +412,60 @@ public Path getCacheDirPath() {
 
   /**
* Check if the cache contains an entry for the requested LookupInfo.
+   *
* @param request
-   * @return  The cached result if there is a match in the cache, or null if 
no match is found.
+   * @return The cached result if there is a match in the cache, or null if no
+   * match is found.
+   * @throws NullPointerException if request is {@code null}
*/
-  public CacheEntry lookup(LookupInfo request) {
-CacheEntry result = null;
+  public CacheEntry lookup(final LookupInfo request) {
+Objects.requireNonNull(request);
 
-LOG.debug("QueryResultsCache lookup for query: {}", request.queryText);
+LOG.debug("Cache lookup for query: {}", request.queryText);
 
+CacheEntry result = null;
 boolean foundPending = false;
-// Cannot entries while we currently hold read lock, so keep track of them 
to delete later.
-Set entriesToRemove = new HashSet();
-Lock readLock = rwLock.readLock();
+// Cannot modify entries while we currently hold read lock, so keep track 
of
+// them to delete later.
+Set entriesToRemove = new HashSet<>();
+cacheReadLock.lock();
 try {
-  // Note: ReentrantReadWriteLock deos not allow upgrading a read lock to 
a write lock.
+  // Note: ReentrantReadWriteLock does not allow upgrading a read lock to 
a write lock.
   // Care must be taken while under read lock, to make sure we do not 
perform any actions
   // which attempt to take a write lock.
-  readLock.lock();
-  Set candidates = queryMap.get(request.queryText);
-  if (candidates != null) {
-CacheEntry pendingResult = null;
-for (CacheEntry candidate : candidates) {
-  if (entryMatches(request, candidate, entriesToRemove)) {
-CacheEntryStatus entryStatus = candidate.status;
-if (entryStatus == CacheEntryStatus.VALID) {
-  result = candidate;
-  break;
-} else if (entryStatus == CacheEntryStatus.PENDING && 
pendingResult == null) {
-  pendingResult = candidate;
-}
+  Collection candidates = queryMap.get(request.queryText);
+  // Try to find valid entry, but settle for pending entry if that is all
+  // there is available
+  for (CacheEntry candidate : candidates) {
+if (entryMatches(request, candidate, entriesToRemove)) {
+  CacheEntryStatus entryStatus = candidate.status;
+  if (entryStatus == CacheEntryStatus.VALID) {
+result = candidate;
+break;
+  }
+  if (entryStatus == CacheEntryStatus.PENDING) {
+// Only accept first pending result
+result = (result == null) ? candidate : result;

Review comment:
   `queryMap` is now a Guava `Multimap` and therefore does not return null 
values (it returns empty `Collection`).  Otherwise just simplifying the code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602326)
Time Spent: 2h 10m  (was: 2h)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602325
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:13
Start Date: 26/May/21 13:13
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639712397



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -355,7 +349,9 @@ public boolean waitForValidStatus() {
   private long maxCacheSize;
   private long maxEntrySize;
   private long maxEntryLifetime;
-  private ReadWriteLock rwLock = new ReentrantReadWriteLock();
+  private final ReadWriteLock cacheLock = new ReentrantReadWriteLock();
+  private final Lock cacheReadLock = cacheLock.readLock();
+  private final Lock cacheWriteLock = cacheLock.writeLock();

Review comment:
   Breakout the `cacheLock` into its parts.  Simplifies code that relies on 
them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602325)
Time Spent: 2h  (was: 1h 50m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602324
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:12
Start Date: 26/May/21 13:12
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639711891



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -319,9 +312,8 @@ public boolean waitForValidStatus() {
 // Status has not changed, continue waiting.
 break;
   }
-
   synchronized (this) {
-this.wait(timeout);
+this.wait();

Review comment:
   This thread is waiting on the state change.  Adding a timeout doesn't 
cause the loop to exit, instead the thread will just loop and see that there 
was no change and `wait` again.  This timeout only adds churn.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602324)
Time Spent: 1h 50m  (was: 1h 40m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602320
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:11
Start Date: 26/May/21 13:11
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639710520



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -338,15 +330,17 @@ public boolean waitForValidStatus() {
   }
 
   // Allow lookup by query string
-  private final Map> queryMap = new HashMap>();
+  @GuardedBy("cacheLock")
+  private final Multimap queryMap = HashMultimap.create();
 
   // LRU. Could also implement LRU as a doubly linked list if CacheEntry keeps 
its node.
   // Use synchronized map since even read actions cause the lru to get updated.
-  private final Map lru = Collections.synchronizedMap(
-  new LinkedHashMap(INITIAL_LRU_SIZE, 
LRU_LOAD_FACTOR, true));
+  private final Set lru =
+  Collections.synchronizedSet(Collections.newSetFromMap(new 
LinkedHashMap<>(16, 0.75f, true)));

Review comment:
   Use `Set` instead of Map with the same key/value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602320)
Time Spent: 1.5h  (was: 1h 20m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602322
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:11
Start Date: 26/May/21 13:11
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639711031



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -260,6 +252,7 @@ private CacheEntryStatus setStatus(CacheEntryStatus 
newStatus) {
   synchronized (this) {
 CacheEntryStatus oldStatus = status;
 status = newStatus;
+this.notifyAll();
 return oldStatus;

Review comment:
   Notify waiting threads when the status changes.  This is the correct way 
to do this instead of putting `notifyAll` in a bunch of different places.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602322)
Time Spent: 1h 40m  (was: 1.5h)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24537?focusedWorklogId=602317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602317
 ]

ASF GitHub Bot logged work on HIVE-24537:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:10
Start Date: 26/May/21 13:10
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #2224:
URL: https://github.com/apache/hive/pull/2224#issuecomment-848756722


   LGTM +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602317)
Time Spent: 1h  (was: 50m)

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602318
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:10
Start Date: 26/May/21 13:10
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639709847



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -783,13 +755,11 @@ private boolean hasSpaceForCacheEntry(CacheEntry entry, 
long size) {
 
   private CacheEntry findEntryToRemove() {
 // Entries should be in LRU order in the keyset iterator.
-Set entries = lru.keySet();
 synchronized (lru) {
-  for (CacheEntry removalCandidate : entries) {
-if (removalCandidate.getStatus() != CacheEntryStatus.VALID) {
-  continue;
+  for (CacheEntry removalCandidate : lru) {
+if (removalCandidate.getStatus() == CacheEntryStatus.VALID) {
+  return removalCandidate;
 }
-return removalCandidate;

Review comment:
   Grab the `keySet` _inside_ of the synchronized block to avoid 
ConcurrentModification. Remove use of 'continue' keyword.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602318)
Time Spent: 1h 10m  (was: 1h)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602319
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:10
Start Date: 26/May/21 13:10
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639710071



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -726,35 +709,24 @@ private boolean entryMatches(LookupInfo lookupInfo, 
CacheEntry entry, Set 
tableToEntryMap.remove(tableName, entry));
+
   lru.remove(entry);
   // Should the cache size be updated here, or after the result data has 
actually been deleted?
   cacheSize -= entry.size;
 } finally {
-  rwLock.writeLock().unlock();
+  cacheWriteLock.unlock();
 }
   }
 
-  private void removeFromLookup(CacheEntry entry) {
-String queryString = entry.getQueryText();
-if (!removeFromEntryMap(queryMap, queryString, entry)) {
-  LOG.warn("Attempted to remove entry but it was not in the cache: {}", 
entry);
-}
-
-// Remove this entry from the table usage mappings.
-entry.getTableNames()
-.forEach(tableName -> removeFromEntryMap(tableToEntryMap, tableName, 
entry));
-  }
-
-  private void calculateEntrySize(CacheEntry entry, FetchWork fetchWork) 
throws IOException {
-Path queryResultsPath = fetchWork.getTblDir();
-FileSystem resultsFs = queryResultsPath.getFileSystem(conf);
-ContentSummary cs = resultsFs.getContentSummary(queryResultsPath);
-entry.size = cs.getLength();
-  }
-

Review comment:
   Unused code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602319)
Time Spent: 1h 20m  (was: 1h 10m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602316
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:09
Start Date: 26/May/21 13:09
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639709076



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -814,37 +784,11 @@ private boolean clearSpaceForCacheEntry(CacheEntry entry, 
long size) {
   }
 }
 
-LOG.info("Could not free enough space for cache entry for query: [{}] 
withe size {}",
+LOG.info("Could not free enough space for cache entry for query: [{}] with 
size {}",
 entry.getQueryText(), size);
 return false;
   }
 
-  private static void addToEntryMap(Map> entryMap,
-  String key, CacheEntry entry) {
-Set entriesForKey = entryMap.get(key);
-if (entriesForKey == null) {
-  entriesForKey = new HashSet();
-  entryMap.put(key, entriesForKey);
-}
-entriesForKey.add(entry);
-  }
-
-  private static boolean removeFromEntryMap(Map> 
entryMap,
-  String key, CacheEntry entry) {
-Set entries = entryMap.get(key);
-if (entries == null) {
-  return false;
-}
-boolean deleted = entries.remove(entry);
-if (!deleted) {
-  return false;
-}
-if (entries.isEmpty()) {
-  entryMap.remove(key);
-}
-return true;
-  }
-

Review comment:
   Using Guava `Multimap` makes this dead code




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602316)
Time Spent: 1h  (was: 50m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602315
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:09
Start Date: 26/May/21 13:09
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2312:
URL: https://github.com/apache/hive/pull/2312#discussion_r639708841



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
##
@@ -885,8 +824,6 @@ public void run() {
 
   private static void cleanupEntry(final CacheEntry entry) {
 Preconditions.checkState(entry.getStatus() == CacheEntryStatus.INVALID);
-final HiveConf conf = getInstance().conf;
-

Review comment:
   Unused code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602315)
Time Spent: 50m  (was: 40m)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25157) Clean up QueryResultsCache Code

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25157?focusedWorklogId=602312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602312
 ]

ASF GitHub Bot logged work on HIVE-25157:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:08
Start Date: 26/May/21 13:08
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #2312:
URL: https://github.com/apache/hive/pull/2312


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602312)
Time Spent: 40m  (was: 0.5h)

> Clean up QueryResultsCache Code
> ---
>
> Key: HIVE-25157
> URL: https://issues.apache.org/jira/browse/HIVE-25157
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> * Remove superfluous code
>  * Simplify lock usage (remove instances of {{synchronization}})
>  * Simplify code with Guava {{Multimap}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24537) Optimise locking in LlapTaskSchedulerService

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24537?focusedWorklogId=602313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602313
 ]

ASF GitHub Bot logged work on HIVE-24537:
-

Author: ASF GitHub Bot
Created on: 26/May/21 13:08
Start Date: 26/May/21 13:08
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2224:
URL: https://github.com/apache/hive/pull/2224#discussion_r639708555



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
##
@@ -1168,16 +1165,13 @@ public void allocateTask(Object task, Resource 
capability, ContainerId container
 capability, null, null, clock.getTime(), id);
 LOG.info("Received allocateRequest. task={}, priority={}, capability={}, 
containerId={}",
 task, priority, capability, containerId);
-writeLock.lock();
-try {
-  if (!dagRunning && metrics != null && id != null) {
+if (!dagRunning) {
+  if (metrics != null && id != null) {

Review comment:
   makes sense. +1 from my side. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602313)
Time Spent: 50m  (was: 40m)

> Optimise locking in LlapTaskSchedulerService
> 
>
> Key: HIVE-24537
> URL: https://issues.apache.org/jira/browse/HIVE-24537
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-15 at 11.41.49 AM.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 1. Read lock should suffice for "notifyStarted()".
> 2. Locking in "allocateTask()" can be optimised. 
> 3. Optimize preemptTasks() & preemptTasksFromMap(). This would help in 
> reducing the codepath with writeLock. Currently, it iterates through all 
> tasks.
>  
>   !Screenshot 2020-12-15 at 11.41.49 AM.png|width=847,height=446!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25162) Add support for CREATE TABLE ... STORED BY ICEBERG statements

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25162?focusedWorklogId=602299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602299
 ]

ASF GitHub Bot logged work on HIVE-25162:
-

Author: ASF GitHub Bot
Created on: 26/May/21 12:37
Start Date: 26/May/21 12:37
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639683528



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -587,6 +618,7 @@ public void testIcebergAndHmsTableProperties() throws 
Exception {
 expectedIcebergProperties.put("custom_property", "initial_val");
 expectedIcebergProperties.put("EXTERNAL", "TRUE");
 expectedIcebergProperties.put("storage_handler", 
HiveIcebergStorageHandler.class.getName());
+expectedIcebergProperties.put(serdeConstants.SERIALIZATION_FORMAT, "1");

Review comment:
   Thanks for the explanation!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602299)
Time Spent: 1h 10m  (was: 1h)

> Add support for CREATE TABLE ... STORED BY ICEBERG statements
> -
>
> Key: HIVE-25162
> URL: https://issues.apache.org/jira/browse/HIVE-25162
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25162) Add support for CREATE TABLE ... STORED BY ICEBERG statements

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25162?focusedWorklogId=602296=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602296
 ]

ASF GitHub Bot logged work on HIVE-25162:
-

Author: ASF GitHub Bot
Created on: 26/May/21 12:35
Start Date: 26/May/21 12:35
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639681461



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -587,6 +618,7 @@ public void testIcebergAndHmsTableProperties() throws 
Exception {
 expectedIcebergProperties.put("custom_property", "initial_val");
 expectedIcebergProperties.put("EXTERNAL", "TRUE");
 expectedIcebergProperties.put("storage_handler", 
HiveIcebergStorageHandler.class.getName());
+expectedIcebergProperties.put(serdeConstants.SERIALIZATION_FORMAT, "1");

Review comment:
   In this PR, I also added a new logic to the `IcebergMetaHook` to copy 
the serdeproperties from HMS table to the catalog properties. This change was 
required since the hive syntax allows to provide additional serdeproperties 
when creating non-native tables. Like:
   `CREATE TABLE  STORED BY ICEBERG WITH 
SERDEPROPERTIES('my_key'='my_value')`
   
   When we create a table, hive automatically updates the serdeproperties with 
the `SERIALIZATION_FORMAT` set to `1`( if it's not overwritten from the DDL). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602296)
Time Spent: 1h  (was: 50m)

> Add support for CREATE TABLE ... STORED BY ICEBERG statements
> -
>
> Key: HIVE-25162
> URL: https://issues.apache.org/jira/browse/HIVE-25162
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25162) Add support for CREATE TABLE ... STORED BY ICEBERG statements

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25162?focusedWorklogId=602295=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602295
 ]

ASF GitHub Bot logged work on HIVE-25162:
-

Author: ASF GitHub Bot
Created on: 26/May/21 12:33
Start Date: 26/May/21 12:33
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639680280



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -239,12 +240,42 @@ public void testCreateDropTableNonDefaultCatalog() throws 
TException, Interrupte
 );
   }
 
+  @Test
+  public void testCreateTableStoredByIceberg() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+String query = String.format("CREATE EXTERNAL TABLE customers (customer_id 
BIGINT, first_name STRING, last_name " +
+"STRING) STORED BY %s %s TBLPROPERTIES ('%s'='%s')",
+"iceBeRg",

Review comment:
   Probably we should not use `%s` here, just substitute the string by hand

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -239,12 +240,42 @@ public void testCreateDropTableNonDefaultCatalog() throws 
TException, Interrupte
 );
   }
 
+  @Test
+  public void testCreateTableStoredByIceberg() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+String query = String.format("CREATE EXTERNAL TABLE customers (customer_id 
BIGINT, first_name STRING, last_name " +
+"STRING) STORED BY %s %s TBLPROPERTIES ('%s'='%s')",
+"iceBeRg",
+testTables.locationForCreateTableSQL(identifier),
+InputFormatConfig.CATALOG_NAME,
+Catalogs.ICEBERG_DEFAULT_CATALOG_NAME);
+shell.executeStatement(query);
+Assert.assertNotNull(testTables.loadTable(identifier));
+  }
+
+  @Test
+  public void testCreateTableStoredByIcebergWithSerdeProperties() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+String query = String.format("CREATE EXTERNAL TABLE customers (customer_id 
BIGINT, first_name STRING, last_name " +
+"STRING) STORED BY %s WITH SERDEPROPERTIES('%s'='%s') %s 
TBLPROPERTIES ('%s'='%s')",
+"iceberg",

Review comment:
   Probably we should not use `%s` here, just substitute the string by hand




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602295)
Time Spent: 50m  (was: 40m)

> Add support for CREATE TABLE ... STORED BY ICEBERG statements
> -
>
> Key: HIVE-25162
> URL: https://issues.apache.org/jira/browse/HIVE-25162
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=602293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602293
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 26/May/21 12:28
Start Date: 26/May/21 12:28
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r639676968



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -213,7 +216,12 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 releaseResources();
   }
 
+  if (SessionState.get() != null) {
+// Remove any query state reference from the session state
+
SessionState.get().removeQueryState(getConf().get(HiveConf.ConfVars.HIVEQUERYID.varname));
+  }
   driverState.executionFinishedWithLocking(isFinishedWithError);
+
 }

Review comment:
   nit of the nit for the newlines:
   ```suggestion
 }
   
 if (SessionState.get() != null) {
   // Remove any query state reference from the session state
   
SessionState.get().removeQueryState(getConf().get(HiveConf.ConfVars.HIVEQUERYID.varname));
 }
   
 driverState.executionFinishedWithLocking(isFinishedWithError);
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602293)
Time Spent: 8.5h  (was: 8h 20m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25162) Add support for CREATE TABLE ... STORED BY ICEBERG statements

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25162?focusedWorklogId=602292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602292
 ]

ASF GitHub Bot logged work on HIVE-25162:
-

Author: ASF GitHub Bot
Created on: 26/May/21 12:25
Start Date: 26/May/21 12:25
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639674283



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -239,6 +240,36 @@ public void testCreateDropTableNonDefaultCatalog() throws 
TException, Interrupte
 );
   }
 
+  @Test
+  public void testCreateTableStoredByIceberg() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+String query = String.format("CREATE EXTERNAL TABLE customers (customer_id 
BIGINT, first_name STRING, last_name " +
+"STRING) STORED BY %s %s TBLPROPERTIES ('%s'='%s')",
+"ICEBERG",

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602292)
Time Spent: 40m  (was: 0.5h)

> Add support for CREATE TABLE ... STORED BY ICEBERG statements
> -
>
> Key: HIVE-25162
> URL: https://issues.apache.org/jira/browse/HIVE-25162
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25166) Query with multiple count(distinct) fails

2021-05-26 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25166:
-


> Query with multiple count(distinct) fails
> -
>
> Key: HIVE-25166
> URL: https://issues.apache.org/jira/browse/HIVE-25166
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> select count(distinct 0), count(distinct null) from alltypes;
> {code}
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not 
> in GROUP BY key 'TOK_NULL'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at 

[jira] [Work started] (HIVE-25166) Query with multiple count(distinct) fails

2021-05-26 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25166 started by Krisztian Kasa.
-
> Query with multiple count(distinct) fails
> -
>
> Key: HIVE-25166
> URL: https://issues.apache.org/jira/browse/HIVE-25166
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> select count(distinct 0), count(distinct null) from alltypes;
> {code}
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not 
> in GROUP BY key 'TOK_NULL'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  

[jira] [Updated] (HIVE-25166) Query with multiple count(distinct) fails

2021-05-26 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25166:
--
Component/s: CBO

> Query with multiple count(distinct) fails
> -
>
> Key: HIVE-25166
> URL: https://issues.apache.org/jira/browse/HIVE-25166
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> select count(distinct 0), count(distinct null) from alltypes;
> {code}
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Expression not 
> in GROUP BY key 'TOK_NULL'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12941)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12883)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4695)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4483)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10960)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10902)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11808)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11665)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11692)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11678)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:618)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12505)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:175)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  

[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=602275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602275
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 26/May/21 11:55
Start Date: 26/May/21 11:55
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r639654144



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -213,7 +216,14 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 releaseResources();
   }
 
+  if (SessionState.get() != null) {
+// Clean up every resource object stored in the query state
+driverContext.getQueryState().removeResources();

Review comment:
   I've deleted the `removeResources`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602275)
Time Spent: 8h 20m  (was: 8h 10m)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=602273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602273
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 26/May/21 11:49
Start Date: 26/May/21 11:49
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2323:
URL: https://github.com/apache/hive/pull/2323#discussion_r639647076



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5218,31 +5218,55 @@ private void 
removeUnusedColumnDescriptor(MColumnDescriptor oldCD) {
   return;
 }
 
-boolean success = false;
 Query query = null;
+LOG.debug("execute removeUnusedColumnDescriptor");
+DatabaseProduct dbProduct = 
DatabaseProduct.determineDatabaseProduct(MetaStoreDirectSql.getProductName(pm), 
conf);
 
-try {
-  openTransaction();
-  LOG.debug("execute removeUnusedColumnDescriptor");
+/**
+ * In order to workaround oracle not supporting limit statement caused 
performance issue, HIVE-9447 makes
+ * all the backend DB run select count(1) from SDS where SDS.CD_ID=? to 
check if the specific CD_ID is
+ * referenced in SDS table before drop a partition. This select count(1) 
statement does not scale well in
+ * Postgres, and there is no index for CD_ID column in SDS table.
+ * For a SDS table with with 1.5 million rows, select count(1) has average 
700ms without index, while in
+ * 10-20ms with index. But the statement before
+ * HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) 
uses less than 10ms .
+ */
 
-  query = pm.newQuery("select count(1) from " +
-"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
+if (dbProduct.isPOSTGRES()) {
+  query = pm.newQuery(MStorageDescriptor.class, "this.cd == inCD");
   query.declareParameters("MColumnDescriptor inCD");
-  long count = ((Long)query.execute(oldCD)).longValue();
-
+  List referencedSDs = 
listStorageDescriptorsWithCD(oldCD, query);

Review comment:
   My guess is that mysql could also use limit, so we might want to test 
the performance for different engines and use the appropriate query for them.
   
   What do you think?
   
   Thanks,
   Peter




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602273)
Time Spent: 40m  (was: 0.5h)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=602271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602271
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 26/May/21 11:48
Start Date: 26/May/21 11:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2323:
URL: https://github.com/apache/hive/pull/2323#discussion_r639649310



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5269,6 +5293,32 @@ private void preDropStorageDescriptor(MStorageDescriptor 
msd) {
 removeUnusedColumnDescriptor(mcd);
   }
 
+  /**
+   * Get a list of storage descriptors that reference a particular Column 
Descriptor
+   * @param oldCD the column descriptor to get storage descriptors for
+   * @return a list of storage descriptors
+   */
+  private List 
listStorageDescriptorsWithCD(MColumnDescriptor oldCD, Query query) {
+boolean success = false;
+List sds = null;
+try {
+  openTransaction();

Review comment:
   Why did we move the transaction here?
   If the previous check runs in a different transaction then we might end up 
in a situation where the check did not found any usage for the descriptor but 
before the transaction for the removal started someone inserted new data and 
started to use the CD.
   
   I think it is only theoretical possibility, but we should check this out.
   
   Thanks,
   Peter




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602271)
Time Spent: 0.5h  (was: 20m)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=602269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602269
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 26/May/21 11:45
Start Date: 26/May/21 11:45
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2323:
URL: https://github.com/apache/hive/pull/2323#discussion_r639647076



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -5218,31 +5218,55 @@ private void 
removeUnusedColumnDescriptor(MColumnDescriptor oldCD) {
   return;
 }
 
-boolean success = false;
 Query query = null;
+LOG.debug("execute removeUnusedColumnDescriptor");
+DatabaseProduct dbProduct = 
DatabaseProduct.determineDatabaseProduct(MetaStoreDirectSql.getProductName(pm), 
conf);
 
-try {
-  openTransaction();
-  LOG.debug("execute removeUnusedColumnDescriptor");
+/**
+ * In order to workaround oracle not supporting limit statement caused 
performance issue, HIVE-9447 makes
+ * all the backend DB run select count(1) from SDS where SDS.CD_ID=? to 
check if the specific CD_ID is
+ * referenced in SDS table before drop a partition. This select count(1) 
statement does not scale well in
+ * Postgres, and there is no index for CD_ID column in SDS table.
+ * For a SDS table with with 1.5 million rows, select count(1) has average 
700ms without index, while in
+ * 10-20ms with index. But the statement before
+ * HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) 
uses less than 10ms .
+ */
 
-  query = pm.newQuery("select count(1) from " +
-"org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
(this.cd == inCD)");
+if (dbProduct.isPOSTGRES()) {
+  query = pm.newQuery(MStorageDescriptor.class, "this.cd == inCD");
   query.declareParameters("MColumnDescriptor inCD");
-  long count = ((Long)query.execute(oldCD)).longValue();
-
+  List referencedSDs = 
listStorageDescriptorsWithCD(oldCD, query);

Review comment:
   Is this really the fastest way to check if the `oldCD` is used?
   Since postgres supports `limit` we might want to use that here.
   Also my guess is that mysql could also use limit, so we might want to test 
the performance for different engines and use the appropriate query for them.
   
   What do you think?
   
   Thanks,
   Peter




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602269)
Time Spent: 20m  (was: 10m)

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351720#comment-17351720
 ] 

Oleksiy Sayankin edited comment on HIVE-21075 at 5/26/21, 11:23 AM:


Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else {
// use select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}


was (Author: osayankin):
Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else
{
// use select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351720#comment-17351720
 ] 

Oleksiy Sayankin edited comment on HIVE-21075 at 5/26/21, 11:22 AM:


Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else
{
// use select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}


was (Author: osayankin):
Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else
{
// select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351720#comment-17351720
 ] 

Oleksiy Sayankin commented on HIVE-21075:
-

Created [PR|https://github.com/apache/hive/pull/2323].

Solution is simple:

{code}
if (isPostgres()){
// use SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1
} else
{
// select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
}
{code}

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread Oleksiy Sayankin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksiy Sayankin reassigned HIVE-21075:
---

Assignee: Oleksiy Sayankin

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21075:
--
Labels: pull-request-available  (was: )

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=602255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602255
 ]

ASF GitHub Bot logged work on HIVE-21075:
-

Author: ASF GitHub Bot
Created on: 26/May/21 11:18
Start Date: 26/May/21 11:18
Worklog Time Spent: 10m 
  Work Description: oleksiy-sayankin opened a new pull request #2323:
URL: https://github.com/apache/hive/pull/2323


   …tgres DB
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602255)
Remaining Estimate: 0h
Time Spent: 10m

> Metastore: Drop partition performance downgrade with Postgres DB
> 
>
> Key: HIVE-21075
> URL: https://issues.apache.org/jira/browse/HIVE-21075
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Yongzhi Chen
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602223
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 10:03
Start Date: 26/May/21 10:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639581441



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -540,6 +540,43 @@ public void testCTASFromHiveTable() {
 Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
   }
 
+  @Test
+  public void testCTASPartitionedFromHiveTable() throws TException, 
InterruptedException {
+Assume.assumeTrue("CTAS target table is supported fully only for 
HiveCatalog tables." +
+"For other catalog types, the HiveIcebergSerDe will create the target 
Iceberg table in the correct catalog " +
+"using the Catalogs.createTable function, but will not register the 
table in HMS since those catalogs do not " +
+"use HiveTableOperations. This means that even though the CTAS query 
succeeds, the user would not be able to " +
+"query this new table from Hive, since HMS does not know about it.",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 
'Linda', 'Finance')");

Review comment:
   Got it. The we might want to create a method for this as well, since we 
have this code at least in 3 places now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602223)
Time Spent: 3h 10m  (was: 3h)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=602217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602217
 ]

ASF GitHub Bot logged work on HIVE-25102:
-

Author: ASF GitHub Bot
Created on: 26/May/21 09:50
Start Date: 26/May/21 09:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2261:
URL: https://github.com/apache/hive/pull/2261#discussion_r639572145



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Driver.java
##
@@ -213,7 +216,14 @@ private void runInternal(String command, boolean 
alreadyCompiled) throws Command
 releaseResources();
   }
 
+  if (SessionState.get() != null) {
+// Clean up every resource object stored in the query state
+driverContext.getQueryState().removeResources();

Review comment:
   The QueryState in the DriverContext should be removed as well when the 
query finishes. Otherwise we still would have a leaked QueryState which is a 
memory leak itself. So this only makes the leak smaller, but does not fix it 
(if we have any)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602217)
Time Spent: 8h 10m  (was: 8h)

> Cache Iceberg table objects within same query
> -
>
> Key: HIVE-25102
> URL: https://issues.apache.org/jira/browse/HIVE-25102
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> We run Catalogs.loadTable(configuration, props) plenty of times which is 
> costly.
> We should:
>  - Cache it maybe even globally based on the queryId
>  - Make sure that the query uses one snapshot during the whole execution of a 
> single query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25160) Automatically pass on iceberg-handler jar as job dependency

2021-05-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-25160:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review [~pvary], [~lpinter]

> Automatically pass on iceberg-handler jar as job dependency
> ---
>
> Key: HIVE-25160
> URL: https://issues.apache.org/jira/browse/HIVE-25160
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently users are required to run the ADD JAR command in their session if 
> they want to use Iceberg tables in Hive jobs.
> This should be done in an automatic way similarly to how hbase-, kudu-, .. 
> etc handlers are doing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25160) Automatically pass on iceberg-handler jar as job dependency

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25160?focusedWorklogId=602211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602211
 ]

ASF GitHub Bot logged work on HIVE-25160:
-

Author: ASF GitHub Bot
Created on: 26/May/21 09:34
Start Date: 26/May/21 09:34
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #2315:
URL: https://github.com/apache/hive/pull/2315


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602211)
Time Spent: 0.5h  (was: 20m)

> Automatically pass on iceberg-handler jar as job dependency
> ---
>
> Key: HIVE-25160
> URL: https://issues.apache.org/jira/browse/HIVE-25160
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently users are required to run the ADD JAR command in their session if 
> they want to use Iceberg tables in Hive jobs.
> This should be done in an automatic way similarly to how hbase-, kudu-, .. 
> etc handlers are doing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25162) Add support for CREATE TABLE ... STORED BY ICEBERG statements

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25162?focusedWorklogId=602202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602202
 ]

ASF GitHub Bot logged work on HIVE-25162:
-

Author: ASF GitHub Bot
Created on: 26/May/21 09:24
Start Date: 26/May/21 09:24
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639553457



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -239,6 +240,36 @@ public void testCreateDropTableNonDefaultCatalog() throws 
TException, Interrupte
 );
   }
 
+  @Test
+  public void testCreateTableStoredByIceberg() {
+TableIdentifier identifier = TableIdentifier.of("default", "customers");
+String query = String.format("CREATE EXTERNAL TABLE customers (customer_id 
BIGINT, first_name STRING, last_name " +
+"STRING) STORED BY %s %s TBLPROPERTIES ('%s'='%s')",
+"ICEBERG",

Review comment:
   I would change most of the test to `STORED BY ICEBERG` and keep only a 
few with the original class name.
   Also I would like to try out `stored by iCeBerG`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602202)
Time Spent: 20m  (was: 10m)

> Add support for CREATE TABLE ... STORED BY ICEBERG statements
> -
>
> Key: HIVE-25162
> URL: https://issues.apache.org/jira/browse/HIVE-25162
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25162) Add support for CREATE TABLE ... STORED BY ICEBERG statements

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25162?focusedWorklogId=602203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602203
 ]

ASF GitHub Bot logged work on HIVE-25162:
-

Author: ASF GitHub Bot
Created on: 26/May/21 09:24
Start Date: 26/May/21 09:24
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2317:
URL: https://github.com/apache/hive/pull/2317#discussion_r639554024



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java
##
@@ -587,6 +618,7 @@ public void testIcebergAndHmsTableProperties() throws 
Exception {
 expectedIcebergProperties.put("custom_property", "initial_val");
 expectedIcebergProperties.put("EXTERNAL", "TRUE");
 expectedIcebergProperties.put("storage_handler", 
HiveIcebergStorageHandler.class.getName());
+expectedIcebergProperties.put(serdeConstants.SERIALIZATION_FORMAT, "1");

Review comment:
   Why is this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602203)
Time Spent: 0.5h  (was: 20m)

> Add support for CREATE TABLE ... STORED BY ICEBERG statements
> -
>
> Key: HIVE-25162
> URL: https://issues.apache.org/jira/browse/HIVE-25162
> Project: Hive
>  Issue Type: New Feature
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25163) UnsupportedTemporalTypeException when starting llap

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25163:
--
Labels: pull-request-available  (was: )

> UnsupportedTemporalTypeException when starting llap
> ---
>
> Key: HIVE-25163
> URL: https://issues.apache.org/jira/browse/HIVE-25163
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When trying to start the LLAP service I get
> {noformat}
> java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: Year
>   at java.time.Instant.getLong(Instant.java:603)
>   at 
> java.time.format.DateTimePrintContext$1.getLong(DateTimePrintContext.java:205)
>   at 
> java.time.format.DateTimePrintContext.getValue(DateTimePrintContext.java:298)
>   at 
> java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2551)
>   at 
> java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2190)
>   at 
> java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1746)
>   at 
> java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1720)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.startLlap(LlapServiceDriver.java:301)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.run(LlapServiceDriver.java:133)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.main(LlapServiceDriver.java:386)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25163) UnsupportedTemporalTypeException when starting llap

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25163?focusedWorklogId=602200=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602200
 ]

ASF GitHub Bot logged work on HIVE-25163:
-

Author: ASF GitHub Bot
Created on: 26/May/21 09:16
Start Date: 26/May/21 09:16
Worklog Time Spent: 10m 
  Work Description: stoty opened a new pull request #2322:
URL: https://github.com/apache/hive/pull/2322


   * use LocalDateTime instead of Instant
   
   ### What changes were proposed in this pull request?
   Use LocalDateTime instead of Instant to generate an llap version string when 
HIVE_VERSION is not set.
   
   
   ### Why are the changes needed?
   The current code throws a  
java.time.temporal.UnsupportedTemporalTypeException,
   making it impossible to start the llap jobs when HIVE_VERSION  is not set.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No. 
   
   ### How was this patch tested?
   Built Hive and started llap successfully 
   (well, at least it got past the point where the exception was thrown).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602200)
Remaining Estimate: 0h
Time Spent: 10m

> UnsupportedTemporalTypeException when starting llap
> ---
>
> Key: HIVE-25163
> URL: https://issues.apache.org/jira/browse/HIVE-25163
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When trying to start the LLAP service I get
> {noformat}
> java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: Year
>   at java.time.Instant.getLong(Instant.java:603)
>   at 
> java.time.format.DateTimePrintContext$1.getLong(DateTimePrintContext.java:205)
>   at 
> java.time.format.DateTimePrintContext.getValue(DateTimePrintContext.java:298)
>   at 
> java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2551)
>   at 
> java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2190)
>   at 
> java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1746)
>   at 
> java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1720)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.startLlap(LlapServiceDriver.java:301)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.run(LlapServiceDriver.java:133)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.main(LlapServiceDriver.java:386)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602190
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 08:25
Start Date: 26/May/21 08:25
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639509256



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -736,6 +740,21 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private boolean skipPartitionCheck() {
+return Optional.ofNullable(conf).map(FileSinkDesc::getTableInfo)
+.map(TableDesc::getProperties)
+.map(props -> 
props.getProperty(hive_metastoreConstants.META_TABLE_STORAGE))
+.map(handler -> {
+  try {
+return HiveUtils.getStorageHandler(hconf, handler);
+  } catch (HiveException e) {
+return null;

Review comment:
   If we return null in any of the map functions in Optional, then it will 
fall back to the `orElse` part (i.e. subsequent map functions won't fail)

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -736,6 +740,21 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private boolean skipPartitionCheck() {
+return Optional.ofNullable(conf).map(FileSinkDesc::getTableInfo)
+.map(TableDesc::getProperties)
+.map(props -> 
props.getProperty(hive_metastoreConstants.META_TABLE_STORAGE))
+.map(handler -> {
+  try {
+return HiveUtils.getStorageHandler(hconf, handler);
+  } catch (HiveException e) {
+return null;
+  }
+})
+.map(HiveStorageHandler::alwaysUnpartitioned)

Review comment:
   No, it shouldn't. If we return null in any of the map functions in 
Optional, then it will fall back to the `orElse` part (i.e. subsequent map 
functions won't fail)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602190)
Time Spent: 3h  (was: 2h 50m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25165) Generate & track statistics per event type for incremental load in replication metrics

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25165:
--
Labels: pull-request-available  (was: )

> Generate & track statistics per event type for incremental load in 
> replication metrics
> --
>
> Key: HIVE-25165
> URL: https://issues.apache.org/jira/browse/HIVE-25165
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Generate and track statistics like mean, median. standard deviation, variance 
> etc per event type during incremental load and store them in replication 
> statistics 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25165) Generate & track statistics per event type for incremental load in replication metrics

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25165?focusedWorklogId=602189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602189
 ]

ASF GitHub Bot logged work on HIVE-25165:
-

Author: ASF GitHub Bot
Created on: 26/May/21 08:20
Start Date: 26/May/21 08:20
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2321:
URL: https://github.com/apache/hive/pull/2321


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602189)
Remaining Estimate: 0h
Time Spent: 10m

> Generate & track statistics per event type for incremental load in 
> replication metrics
> --
>
> Key: HIVE-25165
> URL: https://issues.apache.org/jira/browse/HIVE-25165
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Generate and track statistics like mean, median. standard deviation, variance 
> etc per event type during incremental load and store them in replication 
> statistics 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602187
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 08:19
Start Date: 26/May/21 08:19
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639504751



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -540,6 +540,43 @@ public void testCTASFromHiveTable() {
 Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
   }
 
+  @Test
+  public void testCTASPartitionedFromHiveTable() throws TException, 
InterruptedException {
+Assume.assumeTrue("CTAS target table is supported fully only for 
HiveCatalog tables." +
+"For other catalog types, the HiveIcebergSerDe will create the target 
Iceberg table in the correct catalog " +
+"using the Catalogs.createTable function, but will not register the 
table in HMS since those catalogs do not " +
+"use HiveTableOperations. This means that even though the CTAS query 
succeeds, the user would not be able to " +
+"query this new table from Hive, since HMS does not know about it.",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 
'Linda', 'Finance')");

Review comment:
   It's because I wanted to use a regular Hive table as the source here 
(since that should be a typical CTAS use case for Iceberg)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602187)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602186
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 08:17
Start Date: 26/May/21 08:17
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639503408



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##
@@ -218,7 +235,15 @@ private static Schema hiveSchemaOrThrow(Properties 
serDeProperties, Exception pr
   throws SerDeException {
 // Read the configuration parameters
 String columnNames = 
serDeProperties.getProperty(serdeConstants.LIST_COLUMNS);
+// add partition columns to schema as well, if any
+if (serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMNS) != 
null) {
+  columnNames = columnNames + "," + 
serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMNS);
+}

Review comment:
   Do you mean there should be a newline added after line 241?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602186)
Time Spent: 2h 40m  (was: 2.5h)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25165) Generate & track statistics per event type for incremental load in replication metrics

2021-05-26 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25165:
---


> Generate & track statistics per event type for incremental load in 
> replication metrics
> --
>
> Key: HIVE-25165
> URL: https://issues.apache.org/jira/browse/HIVE-25165
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Generate and track statistics like mean, median. standard deviation, variance 
> etc per event type during incremental load and store them in replication 
> statistics 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25163) UnsupportedTemporalTypeException when starting llap

2021-05-26 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HIVE-25163:
---
Summary: UnsupportedTemporalTypeException when starting llap  (was: 
java.time.temporal.UnsupportedTemporalTypeException when starting llap)

> UnsupportedTemporalTypeException when starting llap
> ---
>
> Key: HIVE-25163
> URL: https://issues.apache.org/jira/browse/HIVE-25163
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> When trying to start the LLAP service I get
> {noformat}
> java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: Year
>   at java.time.Instant.getLong(Instant.java:603)
>   at 
> java.time.format.DateTimePrintContext$1.getLong(DateTimePrintContext.java:205)
>   at 
> java.time.format.DateTimePrintContext.getValue(DateTimePrintContext.java:298)
>   at 
> java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2551)
>   at 
> java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2190)
>   at 
> java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1746)
>   at 
> java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1720)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.startLlap(LlapServiceDriver.java:301)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.run(LlapServiceDriver.java:133)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.main(LlapServiceDriver.java:386)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602180
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 08:07
Start Date: 26/May/21 08:07
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639492832



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -736,6 +740,21 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private boolean skipPartitionCheck() {
+return Optional.ofNullable(conf).map(FileSinkDesc::getTableInfo)
+.map(TableDesc::getProperties)
+.map(props -> 
props.getProperty(hive_metastoreConstants.META_TABLE_STORAGE))
+.map(handler -> {
+  try {
+return HiveUtils.getStorageHandler(hconf, handler);
+  } catch (HiveException e) {
+return null;
+  }
+})
+.map(HiveStorageHandler::alwaysUnpartitioned)

Review comment:
   Wouldn't this end up in a null pointer exception, when we have a 
HiveException? 

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -328,6 +345,7 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
 map.put(InputFormatConfig.TABLE_IDENTIFIER, 
props.getProperty(Catalogs.NAME));
 map.put(InputFormatConfig.TABLE_LOCATION, table.location());
 map.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
+props.put(InputFormatConfig.PARTITION_SPEC, 
PartitionSpecParser.toJson(table.spec()));

Review comment:
   It is not related to this change, but it seems to me that the javadoc 
and the naming of the method are not in sync.  Maybe we should separate the 
logic which is strictly related to storing serializable table data from the 
code which updates table properties.

##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##
@@ -151,7 +152,23 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
   private void createTableForCTAS(Configuration configuration, Properties 
serDeProperties) {
 serDeProperties.setProperty(TableProperties.ENGINE_HIVE_ENABLED, "true");
 serDeProperties.setProperty(InputFormatConfig.TABLE_SCHEMA, 
SchemaParser.toJson(tableSchema));
+
+// build partition spec, if any
+if (serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMNS) != 
null) {
+  String[] partCols = 
serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMNS).split(",");

Review comment:
   Are we certain that the partition column name cannot contain `,`?  

##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -540,6 +540,43 @@ public void testCTASFromHiveTable() {
 Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
   }
 
+  @Test
+  public void testCTASPartitionedFromHiveTable() throws TException, 
InterruptedException {
+Assume.assumeTrue("CTAS target table is supported fully only for 
HiveCatalog tables." +

Review comment:
   Can we do a similar check to in production code as well? It would be 
good to warn the end user about this limitation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602180)
Time Spent: 2.5h  (was: 2h 20m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25164:
--
Labels: pull-request-available  (was: )

> Execute Bootstrap REPL load DDL tasks in parallel
> -
>
> Key: HIVE-25164
> URL: https://issues.apache.org/jira/browse/HIVE-25164
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25164?focusedWorklogId=602179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602179
 ]

ASF GitHub Bot logged work on HIVE-25164:
-

Author: ASF GitHub Bot
Created on: 26/May/21 08:04
Start Date: 26/May/21 08:04
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #2320:
URL: https://github.com/apache/hive/pull/2320


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602179)
Remaining Estimate: 0h
Time Spent: 10m

> Execute Bootstrap REPL load DDL tasks in parallel
> -
>
> Key: HIVE-25164
> URL: https://issues.apache.org/jira/browse/HIVE-25164
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel

2021-05-26 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-25164:

Labels:   (was: pull-request-available)
Status: Patch Available  (was: Open)

> Execute Bootstrap REPL load DDL tasks in parallel
> -
>
> Key: HIVE-25164
> URL: https://issues.apache.org/jira/browse/HIVE-25164
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602178
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:58
Start Date: 26/May/21 07:58
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639489095



##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
##
@@ -262,6 +263,13 @@ public static TableDesc getTableDesc(
   properties.setProperty(serdeConstants.LIST_COLUMN_TYPES, columnTypes);
 }
 
+if (partCols != null && !partCols.isEmpty()) {
+  properties.setProperty(serdeConstants.LIST_PARTITION_COLUMNS, 
partCols.stream()

Review comment:
   Maybe a util method for this? We do the same concatenation somewhere in 
the Iceberg code. I do not really like this spread so much around the code




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602178)
Time Spent: 2h 20m  (was: 2h 10m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602177
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:56
Start Date: 26/May/21 07:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639487411



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -736,6 +740,21 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private boolean skipPartitionCheck() {
+return Optional.ofNullable(conf).map(FileSinkDesc::getTableInfo)
+.map(TableDesc::getProperties)
+.map(props -> 
props.getProperty(hive_metastoreConstants.META_TABLE_STORAGE))
+.map(handler -> {
+  try {
+return HiveUtils.getStorageHandler(hconf, handler);
+  } catch (HiveException e) {
+return null;

Review comment:
   What happens if we return null here? Would the next map fail? Or 
`orElse` will handle if no matching hanler found?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602177)
Time Spent: 2h 10m  (was: 2h)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602176
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:56
Start Date: 26/May/21 07:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639487411



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -736,6 +740,21 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private boolean skipPartitionCheck() {
+return Optional.ofNullable(conf).map(FileSinkDesc::getTableInfo)
+.map(TableDesc::getProperties)
+.map(props -> 
props.getProperty(hive_metastoreConstants.META_TABLE_STORAGE))
+.map(handler -> {
+  try {
+return HiveUtils.getStorageHandler(hconf, handler);
+  } catch (HiveException e) {
+return null;

Review comment:
   What happens if we return null here? Would the next map fail?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602176)
Time Spent: 2h  (was: 1h 50m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602174
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:54
Start Date: 26/May/21 07:54
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639486344



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -736,6 +740,21 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
   }
 
+  private boolean skipPartitionCheck() {

Review comment:
   Maybe a comment would be good here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602174)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602173
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:53
Start Date: 26/May/21 07:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639485639



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -556,15 +593,18 @@ public void testCTASFailureRollback() throws IOException {
 testTables.createTable(shell, "source", 
HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA,
 fileFormat, HiveIcebergStorageHandlerTestUtils.CUSTOMER_RECORDS);
 
-try {
-  shell.executeStatement(String.format("CREATE TABLE target STORED BY '%s' 
AS SELECT * FROM source",
-  HiveIcebergStorageHandler.class.getName()));
-} catch (Exception e) {
-  // expected error
-}
+String[] partitioningSchemes = {"" /* unpartitioned */, "PARTITIONED BY 
(dept)"};

Review comment:
   Maybe test for multiple partition columns?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602173)
Time Spent: 1h 40m  (was: 1.5h)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602171
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:52
Start Date: 26/May/21 07:52
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639484779



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -540,6 +540,43 @@ public void testCTASFromHiveTable() {
 Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
   }
 
+  @Test
+  public void testCTASPartitionedFromHiveTable() throws TException, 
InterruptedException {
+Assume.assumeTrue("CTAS target table is supported fully only for 
HiveCatalog tables." +
+"For other catalog types, the HiveIcebergSerDe will create the target 
Iceberg table in the correct catalog " +
+"using the Catalogs.createTable function, but will not register the 
table in HMS since those catalogs do not " +
+"use HiveTableOperations. This means that even though the CTAS query 
succeeds, the user would not be able to " +
+"query this new table from Hive, since HMS does not know about it.",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 
'Linda', 'Finance')");
+
+shell.executeStatement(String.format(
+"CREATE TABLE target PARTITIONED BY (dept, name) " +
+"STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS SELECT * FROM source",
+HiveIcebergStorageHandler.class.getName(),
+TableProperties.DEFAULT_FILE_FORMAT, fileFormat));
+
+// check table can be read back correctly
+List objects = shell.executeStatement("SELECT * FROM target 
ORDER BY id");

Review comment:
   `HiveIcebergTestUtils.validateData` also has ways to check the table 
records




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602171)
Time Spent: 1.5h  (was: 1h 20m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602169
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:51
Start Date: 26/May/21 07:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639483592



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -540,6 +540,43 @@ public void testCTASFromHiveTable() {
 Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
   }
 
+  @Test
+  public void testCTASPartitionedFromHiveTable() throws TException, 
InterruptedException {
+Assume.assumeTrue("CTAS target table is supported fully only for 
HiveCatalog tables." +
+"For other catalog types, the HiveIcebergSerDe will create the target 
Iceberg table in the correct catalog " +
+"using the Catalogs.createTable function, but will not register the 
table in HMS since those catalogs do not " +
+"use HiveTableOperations. This means that even though the CTAS query 
succeeds, the user would not be able to " +
+"query this new table from Hive, since HMS does not know about it.",
+testTableType == TestTables.TestTableType.HIVE_CATALOG);
+
+shell.executeStatement("CREATE TABLE source (id bigint, name string) 
PARTITIONED BY (dept string) STORED AS ORC");
+shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 
'Linda', 'Finance')");

Review comment:
   Why not use `testTables.createTable` methods?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602169)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602168=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602168
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:50
Start Date: 26/May/21 07:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639482992



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -540,6 +540,43 @@ public void testCTASFromHiveTable() {
 Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, 
objects.get(1));
   }
 
+  @Test
+  public void testCTASPartitionedFromHiveTable() throws TException, 
InterruptedException {
+Assume.assumeTrue("CTAS target table is supported fully only for 
HiveCatalog tables." +

Review comment:
   Do we want to create a method for this long line of code? We already has 
this 3 times




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602168)
Time Spent: 1h 10m  (was: 1h)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602167
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:49
Start Date: 26/May/21 07:49
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639481964



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -328,6 +345,7 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
 map.put(InputFormatConfig.TABLE_IDENTIFIER, 
props.getProperty(Catalogs.NAME));
 map.put(InputFormatConfig.TABLE_LOCATION, table.location());
 map.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
+props.put(InputFormatConfig.PARTITION_SPEC, 
PartitionSpecParser.toJson(table.spec()));

Review comment:
   This is again something which is not by the spec. What happens if this 
change is not propagated? Changing props might not be allowed - even though it 
is working now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602167)
Time Spent: 1h  (was: 50m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602165
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:48
Start Date: 26/May/21 07:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639481964



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -328,6 +345,7 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
 map.put(InputFormatConfig.TABLE_IDENTIFIER, 
props.getProperty(Catalogs.NAME));
 map.put(InputFormatConfig.TABLE_LOCATION, table.location());
 map.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
+props.put(InputFormatConfig.PARTITION_SPEC, 
PartitionSpecParser.toJson(table.spec()));

Review comment:
   This is again something which is not by the spec. What happens if this 
will not propagated?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602165)
Time Spent: 40m  (was: 0.5h)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602166
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:48
Start Date: 26/May/21 07:48
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639481964



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -328,6 +345,7 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
 map.put(InputFormatConfig.TABLE_IDENTIFIER, 
props.getProperty(Catalogs.NAME));
 map.put(InputFormatConfig.TABLE_LOCATION, table.location());
 map.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
+props.put(InputFormatConfig.PARTITION_SPEC, 
PartitionSpecParser.toJson(table.spec()));

Review comment:
   This is again something which is not by the spec. What happens if this 
change is not propagated?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602166)
Time Spent: 50m  (was: 40m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602162
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:47
Start Date: 26/May/21 07:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639481002



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##
@@ -218,7 +235,15 @@ private static Schema hiveSchemaOrThrow(Properties 
serDeProperties, Exception pr
   throws SerDeException {
 // Read the configuration parameters
 String columnNames = 
serDeProperties.getProperty(serdeConstants.LIST_COLUMNS);
+// add partition columns to schema as well, if any
+if (serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMNS) != 
null) {
+  columnNames = columnNames + "," + 
serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMNS);
+}
 String columnTypes = 
serDeProperties.getProperty(serdeConstants.LIST_COLUMN_TYPES);
+// add partition column types to schema as well, if any
+if 
(serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMN_TYPES) != 
null) {
+  columnTypes = columnTypes + ":" + 
serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMN_TYPES);
+}

Review comment:
   nit: newline




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602162)
Time Spent: 0.5h  (was: 20m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel

2021-05-26 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha reassigned HIVE-25164:
---


> Execute Bootstrap REPL load DDL tasks in parallel
> -
>
> Key: HIVE-25164
> URL: https://issues.apache.org/jira/browse/HIVE-25164
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25161) Implement CTAS for partitioned Iceberg tables

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25161?focusedWorklogId=602161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602161
 ]

ASF GitHub Bot logged work on HIVE-25161:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:46
Start Date: 26/May/21 07:46
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2316:
URL: https://github.com/apache/hive/pull/2316#discussion_r639480737



##
File path: 
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##
@@ -218,7 +235,15 @@ private static Schema hiveSchemaOrThrow(Properties 
serDeProperties, Exception pr
   throws SerDeException {
 // Read the configuration parameters
 String columnNames = 
serDeProperties.getProperty(serdeConstants.LIST_COLUMNS);
+// add partition columns to schema as well, if any
+if (serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMNS) != 
null) {
+  columnNames = columnNames + "," + 
serDeProperties.getProperty(serdeConstants.LIST_PARTITION_COLUMNS);
+}

Review comment:
   nit: newline




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602161)
Time Spent: 20m  (was: 10m)

> Implement CTAS for partitioned Iceberg tables
> -
>
> Key: HIVE-25161
> URL: https://issues.apache.org/jira/browse/HIVE-25161
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-287) support count(*) and count distinct on multiple columns

2021-05-26 Thread edward henry (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351600#comment-17351600
 ] 

edward henry commented on HIVE-287:
---

[Vinyl Moon 
Coupons|https://uttercoupons.com/front/store-profile/vinyl-moon-coupons] code 
is best code provided by Vinyl Moon . Amazing Discount Offers, Get Vinyl Moon & 
Promo Codes and save up to 30% on the offer, so get the code helps you to save 
on Vinyl Moon Coupons and promo codes

 

> support count(*) and count distinct on multiple columns
> ---
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, 
> HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24956) Add debug logs for time taken in the incremental event processing

2021-05-26 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-24956.
-
Resolution: Fixed

Committed to master. Thanks for the patch, [~^sharma] !!

> Add debug logs for time taken in the incremental event processing
> -
>
> Key: HIVE-24956
> URL: https://issues.apache.org/jira/browse/HIVE-24956
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24956) Add debug logs for time taken in the incremental event processing

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24956?focusedWorklogId=602152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602152
 ]

ASF GitHub Bot logged work on HIVE-24956:
-

Author: ASF GitHub Bot
Created on: 26/May/21 07:35
Start Date: 26/May/21 07:35
Worklog Time Spent: 10m 
  Work Description: pkumarsinha merged pull request #2135:
URL: https://github.com/apache/hive/pull/2135


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602152)
Time Spent: 1h 20m  (was: 1h 10m)

> Add debug logs for time taken in the incremental event processing
> -
>
> Key: HIVE-24956
> URL: https://issues.apache.org/jira/browse/HIVE-24956
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24956) Add debug logs for time taken in the incremental event processing

2021-05-26 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351588#comment-17351588
 ] 

Pravin Sinha commented on HIVE-24956:
-

+1

> Add debug logs for time taken in the incremental event processing
> -
>
> Key: HIVE-24956
> URL: https://issues.apache.org/jira/browse/HIVE-24956
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25163) java.time.temporal.UnsupportedTemporalTypeException when starting llap

2021-05-26 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reassigned HIVE-25163:
--


> java.time.temporal.UnsupportedTemporalTypeException when starting llap
> --
>
> Key: HIVE-25163
> URL: https://issues.apache.org/jira/browse/HIVE-25163
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> When trying to start the LLAP service I get
> {noformat}
> java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: Year
>   at java.time.Instant.getLong(Instant.java:603)
>   at 
> java.time.format.DateTimePrintContext$1.getLong(DateTimePrintContext.java:205)
>   at 
> java.time.format.DateTimePrintContext.getValue(DateTimePrintContext.java:298)
>   at 
> java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2551)
>   at 
> java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2190)
>   at 
> java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1746)
>   at 
> java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1720)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.startLlap(LlapServiceDriver.java:301)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.run(LlapServiceDriver.java:133)
>   at 
> org.apache.hadoop.hive.llap.cli.service.LlapServiceDriver.main(LlapServiceDriver.java:386)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=602114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602114
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 26/May/21 06:07
Start Date: 26/May/21 06:07
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2208:
URL: https://github.com/apache/hive/pull/2208#issuecomment-848488549


   Hey, @belugabehr, could you please take a look if you have a sec? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 602114)
Time Spent: 20m  (was: 10m)

> Refine the start/end functions in HMSHandler
> 
>
> Key: HIVE-25048
> URL: https://issues.apache.org/jira/browse/HIVE-25048
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Some start/end functions are incomplete in the HMSHandler, the functions can 
> audit the use actions, monitor the performance, and notify the listeners.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-05-26 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25140:

Attachment: HIVE-25140.02.patch

> Hive Distributed Tracing -- Part 1: Disabled
> 
>
> Key: HIVE-25140
> URL: https://issues.apache.org/jira/browse/HIVE-25140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25140.01.patch, HIVE-25140.02.patch
>
>
> Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to 
> Thrift and protobuf version conflicts. A logging only exporter is used.
> There are Spans for BeeLine and Hive. Server 2. The code was developed on 
> branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
> time due to major metastore code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)