date:20210201

[jira] [Work logged] (HIVE-24525) Invite reviewers automatically by file name patterns

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24525?focusedWorklogId=545842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545842
 ]

ASF GitHub Bot logged work on HIVE-24525:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 07:44
Start Date: 02/Feb/21 07:44
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk closed pull request #1930:
URL: https://github.com/apache/hive/pull/1930


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545842)
Time Spent: 1h 10m  (was: 1h)

> Invite reviewers automatically by file name patterns
> 
>
> Key: HIVE-24525
> URL: https://issues.apache.org/jira/browse/HIVE-24525
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I've wrote about this an 
> [email|http://mail-archives.apache.org/mod_mbox/hive-dev/202006.mbox/%3c324a0a23-5841-09fe-a993-1a095035e...@rxd.hu%3e]
>  a long time ago...
> it could help in keeping an eye on some specific parts...eg: thrift and 
> parser changes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24295) Apply schema merge to all shared work optimizations

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24295?focusedWorklogId=545792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545792
 ]

ASF GitHub Bot logged work on HIVE-24295:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 05:01
Start Date: 02/Feb/21 05:01
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1662:
URL: https://github.com/apache/hive/pull/1662#discussion_r568312799



##
File path: ql/src/test/results/clientpositive/llap/correlationoptimizer2.q.out
##
@@ -2668,6 +2684,8 @@ STAGE PLANS:
   input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
   output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+<<< HEAD

Review comment:
   Accidentally left here?

##
File path: ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out
##
@@ -1398,6 +1412,12 @@ STAGE PLANS:
 null sort order: z
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
+<<< HEAD

Review comment:
   Accidentally left here?

##
File path: ql/src/test/results/clientpositive/llap/except_distinct.q.out
##
@@ -481,20 +481,32 @@ STAGE PLANS:
   Select Operator
 expressions: _col0 (type: string), _col1 (type: string), 
_col3 (type: bigint), (_col2 * _col3) (type: bigint)
 outputColumnNames: _col0, _col1, _col2, _col3
+<<< HEAD

Review comment:
   Accidentally left here?

##
File path: ql/src/test/results/clientpositive/llap/subquery_exists_having.q.out
##
@@ -31,13 +31,18 @@ STAGE PLANS:
 Tez
  A masked pattern was here 
   Edges:
-Reducer 2 <- Map 1 (SIMPLE_EDGE)
-Reducer 3 <- Map 1 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE)
+Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
+Reducer 3 <- Map 1 (SIMPLE_EDGE)
  A masked pattern was here 
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
+<<< HEAD

Review comment:
   Accidentally left here?

##
File path: 
ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_2.q.out
##
@@ -183,16 +145,8 @@ STAGE PLANS:
 sort order: 
 Statistics: Num rows: 1 Data size: 516 Basic 
stats: COMPLETE Column stats: NONE
 value expressions: _col0 (type: decimal(34,16)), 
_col1 (type: decimal(34,16)), _col2 (type: tinyint), _col3 (type: tinyint), 
_col4 (type: bigint), _col5 (type: bigint), _col6 (type: binary)
-Execution mode: vectorized, llap
-LLAP IO: all inputs
-Map 7 
-Map Operator Tree:
-TableScan
-  alias: tt2
-  filterExpr: (timestamp_col_18 is not null and 
decimal1911_col_16 is not null and decimal1911_col_16 BETWEEN 
DynamicValue(RS_13_tt1_decimal2612_col_77_min) AND 
DynamicValue(RS_13_tt1_decimal2612_col_77_max) and 
in_bloom_filter(decimal1911_col_16, 
DynamicValue(RS_13_tt1_decimal2612_col_77_bloom_filter))) (type: boolean)
-  Statistics: Num rows: 1 Data size: 152 Basic stats: COMPLETE 
Column stats: NONE
   Filter Operator
-predicate: (timestamp_col_18 is not null and 
decimal1911_col_16 is not null and decimal1911_col_16 BETWEEN 
DynamicValue(RS_13_tt1_decimal2612_col_77_min) AND 
DynamicValue(RS_13_tt1_decimal2612_col_77_max) and 
in_bloom_filter(decimal1911_col_16, 
DynamicValue(RS_13_tt1_decimal2612_col_77_bloom_filter))) (type: boolean)

Review comment:
   Is semi join opportunity lost here?

##
File path: ql/src/test/results/clientpositive/llap/except_all.q.out
##
@@ -495,20 +495,32 @@ STAGE PLANS:
   Select Operator
 expressions: _col0 (type: string), _col1 (type: string), 
_col3 (type: bigint), (_col2 * _col3) (type: bigint)
 outputColumnNames: _col0, _col1, _col2, _col3
+<<< HEAD

Review comment:
   Accidentally left here?

##
File path: 
ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended2.q.out
##
@@ -562,39 +562,63 @@ STAGE PLANS:
 Map Operator Tree:
 TableScan
   alias: src
-  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
+  Statistics: Num rows: 500 Data size: 45500 Basic stats: 
COMPLETE Column stats: COMPLETE
   Select Operator
-expressions: key (type: string)
-outputColumnNames: key
-

[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=545787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545787
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 04:17
Start Date: 02/Feb/21 04:17
Worklog Time Spent: 10m 
  Work Description: dataproc-metastore commented on pull request #1787:
URL: https://github.com/apache/hive/pull/1787#issuecomment-771345794


   > Thanks for your contribution! I modified the commit message a bit since I 
thought it was easier to understand (Driver is a overloaded term in Hive and 
means other things).
   
   Thank you Vihang for the help!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545787)
Time Spent: 6h 10m  (was: 6h)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=545786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545786
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 04:16
Start Date: 02/Feb/21 04:16
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on pull request #1787:
URL: https://github.com/apache/hive/pull/1787#issuecomment-771345161


   Thanks for your contribution! I modified the commit message a bit since I 
thought it was easier to understand (Driver is a overloaded term in Hive and 
means other things).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545786)
Time Spent: 6h  (was: 5h 50m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=545784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545784
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 04:15
Start Date: 02/Feb/21 04:15
Worklog Time Spent: 10m 
  Work Description: vihangk1 merged pull request #1787:
URL: https://github.com/apache/hive/pull/1787


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545784)
Time Spent: 5h 50m  (was: 5h 40m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24478) Subquery GroupBy with Distinct SemanticException: Invalid column reference

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24478?focusedWorklogId=545783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545783
 ]

ASF GitHub Bot logged work on HIVE-24478:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 04:11
Start Date: 02/Feb/21 04:11
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #1732:
URL: https://github.com/apache/hive/pull/1732


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545783)
Time Spent: 40m  (was: 0.5h)

> Subquery GroupBy with Distinct SemanticException: Invalid column reference
> --
>
> Key: HIVE-24478
> URL: https://issues.apache.org/jira/browse/HIVE-24478
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> CREATE TABLE tmp_src1(
>   `npp` string,
>   `nsoc` string) stored as orc;
> INSERT INTO tmp_src1 (npp,nsoc) VALUES ('1-1000CG61', '7273111');
> SELECT `min_nsoc`
> FROM
>  (SELECT `npp`,
>  MIN(`nsoc`) AS `min_nsoc`,
>  COUNT(DISTINCT `nsoc`) AS `nb_nsoc`
>   FROM tmp_src1
>   GROUP BY `npp`) `a`
> WHERE `nb_nsoc` > 0;
> {code}
> Issue:
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
> reference 'nsoc' at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:5405)
> {code}
> Query runs fine when we include `nb_nsoc` in the Select expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24664) Support column aliases in Values clause

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24664?focusedWorklogId=545778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545778
 ]

ASF GitHub Bot logged work on HIVE-24664:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 03:47
Start Date: 02/Feb/21 03:47
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1892:
URL: https://github.com/apache/hive/pull/1892#discussion_r568302310



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/RexNodeExprFactory.java
##
@@ -1000,6 +1003,21 @@ protected FunctionInfo getFunctionInfo(String funcName) 
throws SemanticException
 return functionHelper.getFunctionInfo(funcName);
   }
 
+  @Override
+  protected RexNode replaceFieldNamesInStruct(RexNode expr, List 
newFieldNames) {
+if (newFieldNames.isEmpty()) {
+  return expr;
+}
+
+RexCall structCall = (RexCall) expr;
+List newOperands = structCall.operands.stream()
+.filter(rexNode -> "_UTF-16LE'tok_alias':VARCHAR(2147483647) 
CHARACTER SET \"UTF-16LE\"".compareTo(rexNode.toString()) != 0)

Review comment:
   Have you considered skipping the token in `TypeCheckProcFactory` rather 
than here? I think you could do that if it would not extend `StrExprProcessor`. 
If you cannot, maybe you can return a 'placeholder' object (static final) that 
you can identify easily with == ?
   I think that would simplify the implementation.
   Also note that tokens are not handled directly by the `ExprFactory` 
implementation.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545778)
Time Spent: 1h 40m  (was: 1.5h)

> Support column aliases in Values clause
> ---
>
> Key: HIVE-24664
> URL: https://issues.apache.org/jira/browse/HIVE-24664
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Enable explicitly specify column aliases in the first row of Values clause. 
> If not all the columns has alias specified generate one.
> {code:java}
> values(1, 2 b, 3 c),(4, 5, 6);
> {code}
> {code:java}
> _col1   b   c
>   1 2   3
>   4 5   6
> {code}
>  This is not an standard SQL feature but some database engines like Impala 
> supports it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24711) hive metastore memory leak

2021-02-01 Thread LinZhongwei (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276829#comment-17276829
 ] 

LinZhongwei commented on HIVE-24711:


This is the source code.  Is FileSystem.closeAllForUGI(ugi) missing ?

 

final UserGroupInformation ugi;
 try {
    ugi = UserGroupInformation.getCurrentUser();
 } catch (IOException e) {
 throw new RuntimeException(e);
 }

partFutures.add(threadPool.submit(new Callable() {
    @Override
    public Partition call() throws Exception {
    ugi.doAs(new PrivilegedExceptionAction() {
    @Override
    public Object run() throws Exception {
    try {
    boolean madeDir = createLocationForAddedPartition(table, 
part);
    if (addedPartitions.put(new PartValEqWrapper(part), 
madeDir) != null) {
 // Technically, for ifNotExists case, we could insert one and discard the other
 // because the first one now "exists", but it seems better to report the 
problem
 // upstream as such a command doesn't make sense.
    throw new MetaException("Duplicate partitions in the 
list: " + part);
 }
 initializeAddedPartition(table, part, madeDir);
    } catch (MetaException e) {
    throw new IOException(e.getMessage(), e);
    }
   return null;
}
     });
    return part;
 }
 }));
 }

> hive metastore memory leak
> --
>
> Key: HIVE-24711
> URL: https://issues.apache.org/jira/browse/HIVE-24711
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 3.1.0
>Reporter: LinZhongwei
>Priority: Major
>
> hdp version:3.1.5.31-1
> hive version:3.1.0.3.1.5.31-1
> hadoop version:3.1.1.3.1.5.31-1
> We find that the hive metastore has memory leak if we set 
> compactor.initiator.on to true.
> If we disable the configuration, the memory leak disappear.
> How can we resolve this problem?
> Even if we set the heap size of hive metastore to 40 GB, after 1 month the 
> hive metastore service will be down with outofmemory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24073) Execution exception in sort-merge semijoin

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24073?focusedWorklogId=545764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545764
 ]

ASF GitHub Bot logged work on HIVE-24073:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 03:16
Start Date: 02/Feb/21 03:16
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1476:
URL: https://github.com/apache/hive/pull/1476#issuecomment-771322316


   @maheshk114 , can we resolve conflicts so we can merge? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545764)
Time Spent: 50m  (was: 40m)

> Execution exception in sort-merge semijoin
> --
>
> Key: HIVE-24073
> URL: https://issues.apache.org/jira/browse/HIVE-24073
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Reporter: Jesus Camacho Rodriguez
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Working on HIVE-24041, we trigger an additional SJ conversion that leads to 
> this exception at execution time:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
>   ... 22 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
> overwrite nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
>   ... 23 more
> {code}
> To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in 
> the last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been 
> merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24718) Cleanup of _external_table_info file

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24718:
--
Labels: pull-request-available  (was: )

> Cleanup of _external_table_info file
> 
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24718) Cleanup of _external_table_info file

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=545745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545745
 ]

ASF GitHub Bot logged work on HIVE-24718:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 02:43
Start Date: 02/Feb/21 02:43
Worklog Time Spent: 10m 
  Work Description: ArkoSharma opened a new pull request #1936:
URL: https://github.com/apache/hive/pull/1936


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545745)
Remaining Estimate: 0h
Time Spent: 10m

> Cleanup of _external_table_info file
> 
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24718) Cleanup of _external_table_info file

2021-02-01 Thread Arko Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arko Sharma reassigned HIVE-24718:
--


> Cleanup of _external_table_info file
> 
>
> Key: HIVE-24718
> URL: https://issues.apache.org/jira/browse/HIVE-24718
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24711) hive metastore memory leak

2021-02-01 Thread LinZhongwei (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276772#comment-17276772
 ] 

LinZhongwei edited comment on HIVE-24711 at 2/2/21, 2:30 AM:
-

Our hive metastore just enable storage based authorization. And I find these 
error messages in the hivemetastore.log.

2021-02-02T09:32:30,456 ERROR [PartitionDiscoveryTask-0]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(197)) - 
MetaException(message:java.security.AccessControlException: Permission denied: 
user=hive, access=WRITE, 
inode="/apps/finance/fdop/fdop_final_dev/fdop_dim_pda_recon_delta/batch_date=2020-11-11/batch_seq_num=5":gp_fin_fdop_batch:gp_fin_fdop_batch:drwxr-xr-x
 at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399)
 at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:261)
 at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1859)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1843)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1793)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:7804)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:2217)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.java:1659)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
)
 at 
org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.metaException(AuthorizationPreEventListener.java:430)
 at 
org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeAddPartition(AuthorizationPreEventListener.java:343)
 at 
org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(AuthorizationPreEventListener.java:156)
 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(HiveMetaStore.java:3672)
 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:3841)
 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:4010)
 at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 at com.sun.proxy.$Proxy30.add_partitions_req(Unknown Source)
 at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:760)
 at org.apache.hadoop.hive.metastore.Msck$1.execute(Msck.java:388)
 at org.apache.hadoop.hive.metastore.Msck$1.execute(Msck.java:360)
 at 
org.apache.hadoop.hive.metastore.utils.RetryUtilities$ExponentiallyDecayingBatchWork.run(RetryUtilities.java:91)

at 
org.apache.hadoop.hive.metastore.Msck.createPartitionsInBatches(Msck.java:398)
 at org.apache.hadoop.hive.metastore.Msck.repair(Msck.java:209)
 at 
org.apache.hadoop.hive.metastore.PartitionManagementTask$MsckThread.run(PartitionManagementTask.java:224)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.security.AccessControlException: Permission denied: user=hive, 
access=WRITE, 
inode="/apps/finance/fdop/fdop_final_dev/fdop_dim_pda_recon_delta/batch_date=2020-11-11/batch_seq_num=5":gp_fin_fdop_batch:gp_fin_fdop_batch:drwxr-xr-x
 at

[jira] [Commented] (HIVE-24711) hive metastore memory leak

2021-02-01 Thread LinZhongwei (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276786#comment-17276786
 ] 

LinZhongwei commented on HIVE-24711:


I will try to turn on ranger based authorization.

> hive metastore memory leak
> --
>
> Key: HIVE-24711
> URL: https://issues.apache.org/jira/browse/HIVE-24711
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 3.1.0
>Reporter: LinZhongwei
>Priority: Major
>
> hdp version:3.1.5.31-1
> hive version:3.1.0.3.1.5.31-1
> hadoop version:3.1.1.3.1.5.31-1
> We find that the hive metastore has memory leak if we set 
> compactor.initiator.on to true.
> If we disable the configuration, the memory leak disappear.
> How can we resolve this problem?
> Even if we set the heap size of hive metastore to 40 GB, after 1 month the 
> hive metastore service will be down with outofmemory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24711) hive metastore memory leak

2021-02-01 Thread LinZhongwei (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276783#comment-17276783
 ] 

LinZhongwei commented on HIVE-24711:


Here is a authorization related configuration in the hive-site.xml

hive.security.metastore.authorization.manager
 
org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider

 

Here is hivemetastore-site.xml   
 http://www.w3.org/2001/XInclude;>
 
 hive.compactor.initiator.on
 true
 
 
 hive.compactor.worker.threads
 10
 
 
 hive.metastore.dml.events
 true
 
 
 hive.metastore.event.listeners
 
 
 
 hive.metastore.metrics.enabled
 true
 
 
 hive.metastore.transactional.event.listeners
 org.apache.hive.hcatalog.listener.DbNotificationListener
 
 
 hive.server2.metrics.enabled
 true
 
 
 hive.service.metrics.hadoop2.component
 hivemetastore
 
 
 hive.service.metrics.reporter
 HADOOP2
 
 

 

 

> hive metastore memory leak
> --
>
> Key: HIVE-24711
> URL: https://issues.apache.org/jira/browse/HIVE-24711
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 3.1.0
>Reporter: LinZhongwei
>Priority: Major
>
> hdp version:3.1.5.31-1
> hive version:3.1.0.3.1.5.31-1
> hadoop version:3.1.1.3.1.5.31-1
> We find that the hive metastore has memory leak if we set 
> compactor.initiator.on to true.
> If we disable the configuration, the memory leak disappear.
> How can we resolve this problem?
> Even if we set the heap size of hive metastore to 40 GB, after 1 month the 
> hive metastore service will be down with outofmemory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24711) hive metastore memory leak

2021-02-01 Thread LinZhongwei (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276772#comment-17276772
 ] 

LinZhongwei commented on HIVE-24711:


Our hive metastore just enable storage based authorization. And I find these 
error messages in the hivemetastore.log.

 

 

Caused by: java.security.AccessControlException: Permission denied: user=hive, 
access=WRITE, 
inode="/apps/finance/fdop/fdop_stg/fdop_ft_etl_stg/batch_date=2020-07-07/batch_seq_num=5":gp_fin_fdop_batch:gp_fin_fdop_batch:drwxr-xr-x
 at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399)
 at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:261)
 at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1859)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1843)
 at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1793)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:7804)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:2217)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.java:1659)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

at 
org.apache.hadoop.hive.shims.Hadoop23Shims.wrapAccessException(Hadoop23Shims.java:947)
 ~[hive-exec-3.1.0.3.1.5.31-1.jar:3.1.0.3.1.5.31-1]
 at 
org.apache.hadoop.hive.shims.Hadoop23Shims.checkFileAccess(Hadoop23Shims.java:931)
 ~[hive-exec-3.1.0.3.1.5.31-1.jar:3.1.0.3.1.5.31-1]
 at 
org.apache.hadoop.hive.common.FileUtils.checkFileAccessWithImpersonation(FileUtils.java:402)
 ~[hive-common-3.1.0.3.1.5.31-1.jar:3.1.0.3.1.5.31-1]
 at 
org.apache.hadoop.hive.common.FileUtils.checkFileAccessWithImpersonation(FileUtils.java:370)
 ~[hive-common-3.1.0.3.1.5.31-1.jar:3.1.0.3.1.5.31-1]

> hive metastore memory leak
> --
>
> Key: HIVE-24711
> URL: https://issues.apache.org/jira/browse/HIVE-24711
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 3.1.0
>Reporter: LinZhongwei
>Priority: Major
>
> hdp version:3.1.5.31-1
> hive version:3.1.0.3.1.5.31-1
> hadoop version:3.1.1.3.1.5.31-1
> We find that the hive metastore has memory leak if we set 
> compactor.initiator.on to true.
> If we disable the configuration, the memory leak disappear.
> How can we resolve this problem?
> Even if we set the heap size of hive metastore to 40 GB, after 1 month the 
> hive metastore service will be down with outofmemory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-19253) HMS ignores tableType property for external tables

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-19253?focusedWorklogId=545713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545713
 ]

ASF GitHub Bot logged work on HIVE-19253:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 00:54
Start Date: 02/Feb/21 00:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1537:
URL: https://github.com/apache/hive/pull/1537


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545713)
Time Spent: 2h 40m  (was: 2.5h)

> HMS ignores tableType property for external tables
> --
>
> Key: HIVE-19253
> URL: https://issues.apache.org/jira/browse/HIVE-19253
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.0, 3.0.0, 4.0.0
>Reporter: Alex Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, 
> HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, 
> HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, 
> HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, 
> HIVE-19253.11.patch, HIVE-19253.12.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When someone creates a table using Thrift API they may think that setting 
> tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their 
> table is gone later because HMS will silently change it to managed table.
> here is the offending code:
> {code:java}
>   private MTable convertToMTable(Table tbl) throws InvalidObjectException,
>   MetaException {
> ...
> // If the table has property EXTERNAL set, update table type
> // accordingly
> String tableType = tbl.getTableType();
> boolean isExternal = 
> Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL"));
> if (TableType.MANAGED_TABLE.toString().equals(tableType)) {
>   if (isExternal) {
> tableType = TableType.EXTERNAL_TABLE.toString();
>   }
> }
> if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) {
>   if (!isExternal) { // Here!
> tableType = TableType.MANAGED_TABLE.toString();
>   }
> }
> {code}
> So if the EXTERNAL parameter is not set, table type is changed to managed 
> even if it was external in the first place - which is wrong.
> More over, in other places code looks at the table property to decide table 
> type and some places look at parameter. HMS should really make its mind which 
> one to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24430?focusedWorklogId=545714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545714
 ]

ASF GitHub Bot logged work on HIVE-24430:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 00:54
Start Date: 02/Feb/21 00:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1709:
URL: https://github.com/apache/hive/pull/1709


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545714)
Time Spent: 1h  (was: 50m)

> DiskRangeInfo should make use of DiskRangeList
> --
>
> Key: HIVE-24430
> URL: https://issues.apache.org/jira/browse/HIVE-24430
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> DiskRangeInfo should make user of DiskRangeList instead of List – 
> this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24324) Remove deprecated API usage from Avro

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24324?focusedWorklogId=545712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545712
 ]

ASF GitHub Bot logged work on HIVE-24324:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 00:54
Start Date: 02/Feb/21 00:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1711:
URL: https://github.com/apache/hive/pull/1711


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545712)
Time Spent: 1h 40m  (was: 1.5h)

> Remove deprecated API usage from Avro
> -
>
> Key: HIVE-24324
> URL: https://issues.apache.org/jira/browse/HIVE-24324
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8, 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{JsonProperties#getJsonProp}} has been marked as deprecated in Avro 1.8 and 
> removed since Avro 1.9. This replaces the API usage for this with 
> {{getObjectProp}} which doesn't leak Json node from jackson. This will help 
> downstream apps to depend on Hive while using higher version of Avro, and 
> also help Hive to upgrade Avro version itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23785) Databases, Catalogs and Partitions should have unique id

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23785:
--
Labels: pull-request-available  (was: )

> Databases, Catalogs and Partitions should have unique id
> 
>
> Key: HIVE-23785
> URL: https://issues.apache.org/jira/browse/HIVE-23785
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-20556 introduced a id field to the Table object. This is a useful 
> information since a table which is dropped and recreated with the same name 
> will have a different Id. If a HMS client is caching such table object, it 
> can be used to determine if the table which is present on the client-side 
> matches with the one in the HMS.
> We can expand this idea to other HMS objects like Database, Catalogs and 
> Partitions and add a new id field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23785) Databases, Catalogs and Partitions should have unique id

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23785?focusedWorklogId=545706=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545706
 ]

ASF GitHub Bot logged work on HIVE-23785:
-

Author: ASF GitHub Bot
Created on: 02/Feb/21 00:28
Start Date: 02/Feb/21 00:28
Worklog Time Spent: 10m 
  Work Description: vihangk1 opened a new pull request #1935:
URL: https://github.com/apache/hive/pull/1935


   ### What changes were proposed in this pull request?
   This changes exposes a id field to Database, Catalog and Partition thrift 
objects. The id was always present internally in the metastore but now we start 
exposing it in the thrift interface. This could be very useful to unique 
identify a table or database. For example, a database which is dropped and 
recreated with the same name, will have a different id. Id field is already 
present in the table objects. This PR adds them to Database, Partition and 
Catalogs as well.
   
   ### Why are the changes needed?
   This is a enhancement which could be useful for clients who want to know if 
the object they have is same as the table which is present in the metastore. 
Currently, if another client drops and recreates the object with the same name, 
there is no way of the client to know the difference.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Modified existing tests which assert that id is present in the Database, 
Catalog and Partitions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545706)
Remaining Estimate: 0h
Time Spent: 10m

> Databases, Catalogs and Partitions should have unique id
> 
>
> Key: HIVE-23785
> URL: https://issues.apache.org/jira/browse/HIVE-23785
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-20556 introduced a id field to the Table object. This is a useful 
> information since a table which is dropped and recreated with the same name 
> will have a different Id. If a HMS client is caching such table object, it 
> can be used to determine if the table which is present on the client-side 
> matches with the one in the HMS.
> We can expand this idea to other HMS objects like Database, Catalogs and 
> Partitions and add a new id field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23785) Databases, Catalogs and Partitions should have unique id

2021-02-01 Thread Vihang Karajgaonkar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-23785:
---
Priority: Major  (was: Minor)

> Databases, Catalogs and Partitions should have unique id
> 
>
> Key: HIVE-23785
> URL: https://issues.apache.org/jira/browse/HIVE-23785
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> HIVE-20556 introduced a id field to the Table object. This is a useful 
> information since a table which is dropped and recreated with the same name 
> will have a different Id. If a HMS client is caching such table object, it 
> can be used to determine if the table which is present on the client-side 
> matches with the one in the HMS.
> We can expand this idea to other HMS objects like Database, Catalogs and 
> Partitions and add a new id field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23785) Databases, Catalogs and Partitions should have unique id

2021-02-01 Thread Vihang Karajgaonkar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-23785:
---
Summary: Databases, Catalogs and Partitions should have unique id  (was: 
Database should have a unique id)

> Databases, Catalogs and Partitions should have unique id
> 
>
> Key: HIVE-23785
> URL: https://issues.apache.org/jira/browse/HIVE-23785
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> HIVE-20556 introduced a id field to the Table object. This is a useful 
> information since a table which is dropped and recreated with the same name 
> will have a different Id. If a HMS client is caching such table object, it 
> can be used to determine if the table which is present on the client-side 
> matches with the one in the HMS.
> We can expand this idea to other HMS objects like Database, Catalogs and 
> Partitions and add a new id field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24712) hive.map.aggr=false and hive.optimize.reducededuplication=false provide incorrect result on order by with limit

2021-02-01 Thread liuyan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuyan updated HIVE-24712:
--
Description: 
 When Both param set to false , seems the result is not correct, a query that 
should return 200 rows but now only  returns 35 rows. This is tested on HDP 
3.1.5


set hive.map.aggr=false;
set hive.optimize.reducededuplication=false;

select cs_sold_date_sk,count(distinct cs_order_number) from 
tpcds_orc.catalog_sales_orc group by cs_sold_date_sk  order by cs_sold_date_sk 
limit 200;


--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 ..  llap SUCCEEDED 33 3300
   0   0  
Reducer 2 ..  llap SUCCEEDED  4  400
   0   0  
Reducer 3 ..  llap SUCCEEDED  4  400
   0   0  
Reducer 4 ..  llap SUCCEEDED  1  100
   0   0  
--
VERTICES: 04/04  [==>>] 100%  ELAPSED TIME: 38.23 s
--
FO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :  Map 1  38097.00  0  0 
143,997,065   57,447
INFO  :  Reducer 2   9003.00  0  0  
57,447   13,108
INFO  :  Reducer 3  0.00  0  0  
13,108   35
INFO  :  Reducer 4  0.00  0  0  
350
INFO  : 
--
INFO  : 
INFO  : LLAP IO Summary


 

 

set hive.map.aggr=true;
set hive.optimize.reducededuplication=false;

select cs_sold_date_sk,count(distinct cs_order_number) from 
tpcds_orc.catalog_sales_orc group by cs_sold_date_sk  order by cs_sold_date_sk 
limit 200;
--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 ..  llap SUCCEEDED 33 3300
   0   0  
Reducer 2 ..  llap SUCCEEDED  4  400
   0   0  
Reducer 3 ..  llap SUCCEEDED  2  200
   0   0  
Reducer 4 ..  llap SUCCEEDED  1  100
   0   0  
--
VERTICES: 04/04  [==>>] 100%  ELAPSED TIME: 36.24 s
--


INFO  : 
--
INFO  :   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :  Map 1  25595.00  0  0 
143,997,065   16,703,757
INFO  :  Reducer 2  18556.00  0  0  
16,703,757  800
INFO  :  Reducer 3   8018.00  0  0 
800  200
INFO  :  Reducer 4  0.00  0  0 
2000
INFO  : 
--
INFO  : 

  was:
 When Both param set to false , seems the result is not correct, only 35 rows. 
This is tested on HDP 3.1.5


--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 ..  llap SUCCEEDED 33 3300
   0   0  
Reducer 2 ..  llap SUCCEEDED  4  400

[jira] [Work logged] (HIVE-24717) Migrate to listStatusIterator in moving files

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24717?focusedWorklogId=545693=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545693
 ]

ASF GitHub Bot logged work on HIVE-24717:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 23:33
Start Date: 01/Feb/21 23:33
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1934:
URL: https://github.com/apache/hive/pull/1934


   Change-Id: I7f718cb368c62cfbbc1ab80d7f5b9877391f5611
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545693)
Remaining Estimate: 0h
Time Spent: 10m

> Migrate to listStatusIterator in moving files
> -
>
> Key: HIVE-24717
> URL: https://issues.apache.org/jira/browse/HIVE-24717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive.java has various calls to hdfs listStatus call when moving 
> files/directories around. These codepaths are used for insert overwrite 
> table/partition queries.
> listStatus It is blocking call whereas listStatusIterator is backed by a 
> RemoteIterator and fetches pages in the background. Hive should take 
> advantage of that since Hadoop has implemented listStatusIterator for S3 
> recently https://issues.apache.org/jira/browse/HADOOP-17074



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24717) Migrate to listStatusIterator in moving files

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24717:
--
Labels: pull-request-available  (was: )

> Migrate to listStatusIterator in moving files
> -
>
> Key: HIVE-24717
> URL: https://issues.apache.org/jira/browse/HIVE-24717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive.java has various calls to hdfs listStatus call when moving 
> files/directories around. These codepaths are used for insert overwrite 
> table/partition queries.
> listStatus It is blocking call whereas listStatusIterator is backed by a 
> RemoteIterator and fetches pages in the background. Hive should take 
> advantage of that since Hadoop has implemented listStatusIterator for S3 
> recently https://issues.apache.org/jira/browse/HADOOP-17074



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24717) Migrate to listStatusIterator in moving files

2021-02-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-24717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa İman reassigned HIVE-24717:
---


> Migrate to listStatusIterator in moving files
> -
>
> Key: HIVE-24717
> URL: https://issues.apache.org/jira/browse/HIVE-24717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>
> Hive.java has various calls to hdfs listStatus call when moving 
> files/directories around. These codepaths are used for insert overwrite 
> table/partition queries.
> listStatus It is blocking call whereas listStatusIterator is backed by a 
> RemoteIterator and fetches pages in the background. Hive should take 
> advantage of that since Hadoop has implemented listStatusIterator for S3 
> recently https://issues.apache.org/jira/browse/HADOOP-17074



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545689
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 23:16
Start Date: 01/Feb/21 23:16
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-771227835


   > All q.out files show data size increase for tables. Since most of them are 
consistently additional 4 bytes per row, that seems like not a bug. However, I 
found some irregular increases too like 16 bytes per row. Can you explain why 
data size increased so we can check the irregularities and make sure they are 
expected?
   
   Hey @mustafaiman -- the main size differences are on Timestamp columns where 
we now support nanosecond precision (using 2 extra variables for the lower and 
the upper precision as part of the stats -- see 
[ORC-611](https://issues.apache.org/jira/browse/ORC-611)).
   
   Other than that there are other changes that can also affect size, such as: 
Trimming StringStatistics minimum and maximum values as part of ORC-203  or 
List and Map column statistics that was recently added as part of ORC-398.
   
   Happy to check further if you have doubts about a particular query.
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545689)
Time Spent: 7h 40m  (was: 7.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545684
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 23:05
Start Date: 01/Feb/21 23:05
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568204044



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final 
int columnIndex,
 .setColumnEncoding(columnEncoding)
 .setVectors(vectors)
 .setContext(context)
+.setIsInstant(columnType.getCategory()  == 
TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
   Even though TimeStamp with local timezone was added as part of 
[ORC-189](https://issues.apache.org/jira/browse/ORC-189) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545684)
Time Spent: 7.5h  (was: 7h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545680
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 22:58
Start Date: 01/Feb/21 22:58
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568200811



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   I see ORC strictly enforces this now. I would set the appropriate 
setting at Hive-ORC boundary and leave the LLAP_ALLOCATOR_MAX_ALLOC as it is 
(Math.min(llap.allocator.max, what ORC enforces). If you think we should set 
LLAP_ALLOCATOR_MAX_ALLOC to be the same as what ORC enforces, that can be done 
in a seperate ticket. Like you said this is orthogonal to ORC version bump, 
therefore should be discussed in its own ticket.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545680)
Time Spent: 7h 20m  (was: 7h 10m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24443) Optimise VectorSerializeRow for primitives

2021-02-01 Thread Rajesh Balamohan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-24443.
-
Resolution: Fixed

Fixed via HIVE-24503

> Optimise VectorSerializeRow for primitives
> --
>
> Key: HIVE-24443
> URL: https://issues.apache.org/jira/browse/HIVE-24443
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: Screenshot 2020-11-30 at 9.39.31 AM.png
>
>
> !Screenshot 2020-11-30 at 9.39.31 AM.png|width=826,height=477!
>  
> One option could be to have specific serializer embedded in "Field" object 
> within VectorSerializeRow. This would avoid unwanted switching every time.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSerializeRow.java#L63]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24654) Table level replication support for Atlas metadata

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24654?focusedWorklogId=545666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545666
 ]

ASF GitHub Bot logged work on HIVE-24654:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 22:32
Start Date: 01/Feb/21 22:32
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1883:
URL: https://github.com/apache/hive/pull/1883#discussion_r568186785



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/atlas/AtlasRequestBuilder.java
##
@@ -105,6 +126,50 @@ private String getQualifiedName(String clusterName, String 
srcDb) {
 return qualifiedName;
   }
 
+  private String getQualifiedName(String clusterName, String srcDB, String 
tableName) {
+String qualifiedTableName = 
String.format(QUALIFIED_NAME_HIVE_TABLE_FORMAT, srcDB, tableName);
+return getQualifiedName(clusterName,  qualifiedTableName);
+  }
+
+  private List getQualifiedNames(String clusterName, String srcDb, 
Path listOfTablesFile, HiveConf conf)
+  throws SemanticException {
+List qualifiedNames = new ArrayList<>();
+List tableNames = getFileAsList(listOfTablesFile, conf);
+if (CollectionUtils.isEmpty(tableNames)) {
+  LOG.info("Empty file encountered: {}", listOfTablesFile);
+  return qualifiedNames;
+}
+for (String tableName : tableNames) {
+  qualifiedNames.add(getQualifiedName(clusterName, srcDb, tableName));
+}
+return qualifiedNames;
+  }
+
+  public List getFileAsList(Path listOfTablesFile, HiveConf conf) 
throws SemanticException {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545666)
Time Spent: 1h 10m  (was: 1h)

> Table level replication support for Atlas metadata
> --
>
> Key: HIVE-24654
> URL: https://issues.apache.org/jira/browse/HIVE-24654
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24654.01.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Covers mainly Atlas export API payload change required to support table level 
> replication



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545643
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 22:09
Start Date: 01/Feb/21 22:09
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568174337



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   LLAP_ALLOCATOR_MAX_ALLOC is used both for the LowLevelCacheImpl 
(buddyAllocator) and bufferSize on 
[WriterOptions](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L1553)
 
   
   Please check how this propagated from 
[SerDeEncodedDataReader](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L248)
 
   
   Llap is tightly coupled to ORC, thus it could make sense to use the same 
buffer size for serialized Buffers, and the ORC writer as we would not need to 
split/merge them  -- however I have nothing against splitting the conf or 
checking is the 8Mb limit is a hard one.
   All I am trying to say here is that this is orthogonal the ORC version bump.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545643)
Time Spent: 7h  (was: 6h 50m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545644
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 22:09
Start Date: 01/Feb/21 22:09
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568174337



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   LLAP_ALLOCATOR_MAX_ALLOC is used both for the LowLevelCacheImpl 
(buddyAllocator) and bufferSize on 
[WriterOptions](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L1553)
  Please check how this propagated from 
[SerDeEncodedDataReader](https://github.com/apache/hive/blob/da1aa077716a65c2a02d850828b16cdeece1f574/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java#L248)
 
   
   Llap is tightly coupled to ORC, thus it could make sense to use the same 
buffer size for serialized Buffers, and the ORC writer as we would not need to 
split/merge them  -- however I have nothing against splitting the conf or 
checking is the 8Mb limit is a hard one.
   All I am trying to say here is that this is orthogonal the ORC version bump.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545644)
Time Spent: 7h 10m  (was: 7h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545623
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 21:54
Start Date: 01/Feb/21 21:54
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568166455



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/LlapRecordReaderUtils.java
##
@@ -0,0 +1,440 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.llap.io.encoded;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.ql.io.orc.encoded.LlapDataReader;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.BufferChunk;
+import org.apache.orc.impl.DataReaderProperties;
+import org.apache.orc.impl.DirectDecompressionCodec;
+import org.apache.orc.impl.HadoopShims;
+import org.apache.orc.impl.HadoopShimsFactory;
+import org.apache.orc.impl.InStream;
+import org.apache.orc.impl.OrcCodecPool;
+import org.apache.orc.impl.OrcIndex;
+import org.apache.orc.impl.RecordReaderUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.function.Supplier;
+
+public class LlapRecordReaderUtils {
+
+  private static final HadoopShims SHIMS = HadoopShimsFactory.get();
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapRecordReaderUtils.class);
+
+  static HadoopShims.ZeroCopyReaderShim createZeroCopyShim(FSDataInputStream 
file, CompressionCodec codec,
+  RecordReaderUtils.ByteBufferAllocatorPool pool) throws IOException {
+return codec == null || (codec instanceof DirectDecompressionCodec && 
((DirectDecompressionCodec) codec)
+.isAvailable()) ? null : SHIMS.getZeroCopyReader(file, pool);

Review comment:
   Good catch, FIXed thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545623)
Time Spent: 6h 50m  (was: 6h 40m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545617
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 21:45
Start Date: 01/Feb/21 21:45
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568161096



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/LlapRecordReaderUtils.java
##
@@ -0,0 +1,440 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.llap.io.encoded;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.io.DiskRangeList;
+import org.apache.hadoop.hive.ql.io.orc.encoded.LlapDataReader;
+import org.apache.orc.CompressionCodec;
+import org.apache.orc.CompressionKind;
+import org.apache.orc.OrcFile;
+import org.apache.orc.OrcProto;
+import org.apache.orc.StripeInformation;
+import org.apache.orc.TypeDescription;
+import org.apache.orc.impl.BufferChunk;
+import org.apache.orc.impl.DataReaderProperties;
+import org.apache.orc.impl.DirectDecompressionCodec;
+import org.apache.orc.impl.HadoopShims;
+import org.apache.orc.impl.HadoopShimsFactory;
+import org.apache.orc.impl.InStream;
+import org.apache.orc.impl.OrcCodecPool;
+import org.apache.orc.impl.OrcIndex;
+import org.apache.orc.impl.RecordReaderUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.function.Supplier;
+
+public class LlapRecordReaderUtils {
+
+  private static final HadoopShims SHIMS = HadoopShimsFactory.get();
+  private static final Logger LOG = 
LoggerFactory.getLogger(LlapRecordReaderUtils.class);
+
+  static HadoopShims.ZeroCopyReaderShim createZeroCopyShim(FSDataInputStream 
file, CompressionCodec codec,
+  RecordReaderUtils.ByteBufferAllocatorPool pool) throws IOException {
+return codec == null || (codec instanceof DirectDecompressionCodec && 
((DirectDecompressionCodec) codec)
+.isAvailable()) ? null : SHIMS.getZeroCopyReader(file, pool);

Review comment:
   I think this was equivalent to `codec == null || (codec instanceof 
DirectDecompressionCodec && ((DirectDecompressionCodec) codec).isAvailable()) ? 
SHIMS.getZeroCopyReader(file, pool) : null`
before. Looks like `null: SHIMS.getZeroCopyReader` thing got inverted.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545617)
Time Spent: 6h 40m  (was: 6.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545610
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 21:38
Start Date: 01/Feb/21 21:38
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r568157322



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   I still do not understand why we need to change LLAP Allocator's maximum 
allocation size. Does LLAP allocator serve only ORC writers? I think it is used 
for other buffer needs too.
   
   Hive depends on ORC. So I dont understand how ORC uses 
LLAP_ALLOCATOR_MAX_ALLOC for anything. We pass orc writers the appropriate 
configs. If ORC writers need smaller buffer, we can configure that for those 
writers via WriterOptions. There is no need to change llap allocator's settings 
for that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545610)
Time Spent: 6.5h  (was: 6h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24716) jQuery file symlink is replaced by physical file which requires changes on both the places

2021-02-01 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24716:
--
Description: 
HIVE-22066 replaced symlink

llap-server/src/main/resources/hive-webapps/llap/js/jquery.min.js -> 
service/src/resources/hive-webapps/static/js/jquery.min.js

with a physical file, whenever jQuery version gets upgraded, same changes needs 
to be done on both places

  was:
HIVE-22099 replaced symlink

llap-server/src/main/resources/hive-webapps/llap/js/jquery.min.js -> 
service/src/resources/hive-webapps/static/js/jquery.min.js

with a physical file, whenever jQuery version gets upgraded, same changes needs 
to be done on both places


> jQuery file symlink is replaced by physical file which requires changes on 
> both the places
> --
>
> Key: HIVE-24716
> URL: https://issues.apache.org/jira/browse/HIVE-24716
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Priority: Major
>
> HIVE-22066 replaced symlink
> llap-server/src/main/resources/hive-webapps/llap/js/jquery.min.js -> 
> service/src/resources/hive-webapps/static/js/jquery.min.js
> with a physical file, whenever jQuery version gets upgraded, same changes 
> needs to be done on both places



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24693) Parquet Timestamp Values Read/Write Very Slow

2021-02-01 Thread David Mollitor (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276595#comment-17276595
 ] 

David Mollitor commented on HIVE-24693:
---

[~klcopp] That may be the case, but there was a unit test that was generating 
negative dates.  That's what broke my work. Ugh.

> Parquet Timestamp Values Read/Write Very Slow
> -
>
> Key: HIVE-24693
> URL: https://issues.apache.org/jira/browse/HIVE-24693
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a 
> timestamp object into a binary value.  The way in which it does this,... it 
> calls {{toString()}} on the timestamp object, and then parses the String.  
> This particular timestamp do not carry a timezone, so the string is something 
> like:
> {{2021-21-03 12:32:23....}}
> The parse code tries to parse the string assuming there is a time zone, and 
> if not, falls-back and applies the provided "default time zone".  As was 
> noted in [HIVE-24353], if something fails to parse, it is very expensive to 
> try to parse again.  So, for each timestamp in the Parquet file, it:
> * Builds a string from the time stamp
> * Parses it (throws an exception, parses again)
> There is no need to do this kind of string manipulations/parsing, it should 
> just be using the epoch millis/seconds/time stored internal to the Timestamp 
> object.
> {code:java}
>   // Converts Timestamp to TimestampTZ.
>   public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
> return parse(ts.toString(), defaultTimeZone);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545537
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 19:24
Start Date: 01/Feb/21 19:24
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568080013



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -46,30 +48,25 @@ public MemoryInfo(Configuration conf) {
   llapInfo.initClusterInfo();
   if (llapInfo.hasClusterInfo()) {
 this.maxExecutorMemory = llapInfo.getMemoryPerExecutor();
+LOG.info("Using LLAP registry executor MB {}", maxExecutorMemory / 
(1024L * 1024L));

Review comment:
   maxExecutorMemory should be in Bytes, see
   
https://github.com/apache/hive/blob/aee31f8d03a9d9b1fce3b5cc8788b2238cbaf351/ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java#L106





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545537)
Time Spent: 2h 50m  (was: 2h 40m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545536
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 19:23
Start Date: 01/Feb/21 19:23
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568080013



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -46,30 +48,25 @@ public MemoryInfo(Configuration conf) {
   llapInfo.initClusterInfo();
   if (llapInfo.hasClusterInfo()) {
 this.maxExecutorMemory = llapInfo.getMemoryPerExecutor();
+LOG.info("Using LLAP registry executor MB {}", maxExecutorMemory / 
(1024L * 1024L));

Review comment:
   maxExecutorMemory should be in Bytes





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545536)
Time Spent: 2h 40m  (was: 2.5h)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545514
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 19:00
Start Date: 01/Feb/21 19:00
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568066289



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -46,30 +48,25 @@ public MemoryInfo(Configuration conf) {
   llapInfo.initClusterInfo();
   if (llapInfo.hasClusterInfo()) {
 this.maxExecutorMemory = llapInfo.getMemoryPerExecutor();
+LOG.info("Using LLAP registry executor MB {}", maxExecutorMemory / 
(1024L * 1024L));

Review comment:
   This just does not look correct to me.
   
   Are you trying to log the value here in MB?  I think 
`this.maxExecutorMemory` is in MB already.  It is currently TB to MB with this 
large divisor.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545514)
Time Spent: 2.5h  (was: 2h 20m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24673) Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24673?focusedWorklogId=545506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545506
 ]

ASF GitHub Bot logged work on HIVE-24673:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 18:42
Start Date: 01/Feb/21 18:42
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #1902:
URL: https://github.com/apache/hive/pull/1902#issuecomment-771071771


   @kgyrtkirk I believe I addressed all the comments. Can you take another look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545506)
Time Spent: 2h 50m  (was: 2h 40m)

> Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap
> -
>
> Key: HIVE-24673
> URL: https://issues.apache.org/jira/browse/HIVE-24673
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> These test drivers should run on llap. Otherwise we can run into situations 
> where certain queries correctly fail on MapReduce but not on Tez.
> Also, it is better if negative cli drivers does not mask "Caused by" lines in 
> test output. Otherwise, a query may start to fail for other reasons than the 
> expected one and we do not realize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24693) Parquet Timestamp Values Read/Write Very Slow

2021-02-01 Thread Karen Coppage (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276566#comment-17276566
 ] 

Karen Coppage commented on HIVE-24693:
--

[~belugabehr] Per the the Wiki, Hive can handle years 0001-. However it 
doesn't really complain about years outside of that range. I once tried to get 
Hive to enforce this range but didn't get very far. FYI :)

BTW let me know if/when you want a review!

> Parquet Timestamp Values Read/Write Very Slow
> -
>
> Key: HIVE-24693
> URL: https://issues.apache.org/jira/browse/HIVE-24693
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a 
> timestamp object into a binary value.  The way in which it does this,... it 
> calls {{toString()}} on the timestamp object, and then parses the String.  
> This particular timestamp do not carry a timezone, so the string is something 
> like:
> {{2021-21-03 12:32:23....}}
> The parse code tries to parse the string assuming there is a time zone, and 
> if not, falls-back and applies the provided "default time zone".  As was 
> noted in [HIVE-24353], if something fails to parse, it is very expensive to 
> try to parse again.  So, for each timestamp in the Parquet file, it:
> * Builds a string from the time stamp
> * Parses it (throws an exception, parses again)
> There is no need to do this kind of string manipulations/parsing, it should 
> just be using the epoch millis/seconds/time stored internal to the Timestamp 
> object.
> {code:java}
>   // Converts Timestamp to TimestampTZ.
>   public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) {
> return parse(ts.toString(), defaultTimeZone);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545493
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 18:19
Start Date: 01/Feb/21 18:19
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-771057135


   Tests just passed and comments are addressed above.
   @mustafaiman @jcamachor please take another look and let me know what you 
think :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545493)
Time Spent: 6h 20m  (was: 6h 10m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545488
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 18:11
Start Date: 01/Feb/21 18:11
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568034425



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -46,30 +48,25 @@ public MemoryInfo(Configuration conf) {
   llapInfo.initClusterInfo();
   if (llapInfo.hasClusterInfo()) {
 this.maxExecutorMemory = llapInfo.getMemoryPerExecutor();
+LOG.info("Using LLAP registry executor MB {}", (maxExecutorMemory / 
1024L * 1024L));

Review comment:
   Good catch, thanks! 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545488)
Time Spent: 2h 20m  (was: 2h 10m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545485
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 18:04
Start Date: 01/Feb/21 18:04
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568030343



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -46,30 +48,25 @@ public MemoryInfo(Configuration conf) {
   llapInfo.initClusterInfo();
   if (llapInfo.hasClusterInfo()) {
 this.maxExecutorMemory = llapInfo.getMemoryPerExecutor();
+LOG.info("Using LLAP registry executor MB {}", (maxExecutorMemory / 
1024L * 1024L));

Review comment:
   Thanks, however `/ 1024L * 1024L` is a no-op





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545485)
Time Spent: 2h 10m  (was: 2h)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276424#comment-17276424
 ] 

Attila Magyar edited comment on HIVE-24715 at 2/1/21, 6:02 PM:
---

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by max_statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

The change is backward compatible with the prior implementation while upsizing 
the range wouldn't.

 


was (Author: amagyar):
Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545480
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 17:55
Start Date: 01/Feb/21 17:55
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568023927



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -46,30 +48,25 @@ public MemoryInfo(Configuration conf) {
   llapInfo.initClusterInfo();
   if (llapInfo.hasClusterInfo()) {
 this.maxExecutorMemory = llapInfo.getMemoryPerExecutor();
+LOG.info("Using LLAP registry executor MB {}", (maxExecutorMemory / 
1024L * 1024L));

Review comment:
   maxExecutorMemory is in bytes (see how we handle it on Tez and MR mode) 
-- however LlapClusterState does the conversion to bytes (from MB) prematurely 
as part of: 
https://github.com/apache/hive/blob/5eebbdf7c5750b31e1c43fe576fc0ab728bce05c/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java#L147





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545480)
Time Spent: 2h  (was: 1h 50m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545479
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 17:54
Start Date: 01/Feb/21 17:54
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568023927



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -46,30 +48,25 @@ public MemoryInfo(Configuration conf) {
   llapInfo.initClusterInfo();
   if (llapInfo.hasClusterInfo()) {
 this.maxExecutorMemory = llapInfo.getMemoryPerExecutor();
+LOG.info("Using LLAP registry executor MB {}", (maxExecutorMemory / 
1024L * 1024L));

Review comment:
   maxExecutorMemory is in bytes (see how we handle it on Tez and MR mode) 
-- however LlapCluster state does the conversion prematurely as part of: 
https://github.com/apache/hive/blob/5eebbdf7c5750b31e1c43fe576fc0ab728bce05c/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java#L147





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545479)
Time Spent: 1h 50m  (was: 1h 40m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545475
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 17:49
Start Date: 01/Feb/21 17:49
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568020306



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -46,30 +48,25 @@ public MemoryInfo(Configuration conf) {
   llapInfo.initClusterInfo();
   if (llapInfo.hasClusterInfo()) {
 this.maxExecutorMemory = llapInfo.getMemoryPerExecutor();
+LOG.info("Using LLAP registry executor MB {}", (maxExecutorMemory / 
1024L * 1024L));

Review comment:
   I think there's a type here, maxExecutorMemory I believe is already is 
already in MB?  Regardless, there's a typo here... I think you want to multiple 
1024 twice, not divide :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545475)
Time Spent: 1h 40m  (was: 1.5h)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545468
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 17:35
Start Date: 01/Feb/21 17:35
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r56808



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -1734,7 +1735,7 @@ private boolean checkMapSideAggregation(GroupByOperator 
gop,
 float hashAggMaxThreshold = 
conf.getFloatVar(HiveConf.ConfVars.HIVEMAPAGGRMEMORYTHRESHOLD);
 
 // get available map memory
-long totalMemory = StatsUtils.getAvailableMemory(conf, true) * 1000L * 
1000L;
+long totalMemory = DagUtils.getContainerResource(conf).getMemorySize() 
* 1000L * 1000L;

Review comment:
   ACK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545468)
Time Spent: 1.5h  (was: 1h 20m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545458
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 17:22
Start Date: 01/Feb/21 17:22
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568002349



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
##
@@ -1734,7 +1735,7 @@ private boolean checkMapSideAggregation(GroupByOperator 
gop,
 float hashAggMaxThreshold = 
conf.getFloatVar(HiveConf.ConfVars.HIVEMAPAGGRMEMORYTHRESHOLD);
 
 // get available map memory
-long totalMemory = StatsUtils.getAvailableMemory(conf, true) * 1000L * 
1000L;
+long totalMemory = DagUtils.getContainerResource(conf).getMemorySize() 
* 1000L * 1000L;

Review comment:
   Existing issue, but please address here: this should be in MiB (1024L) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545458)
Time Spent: 1h 20m  (was: 1h 10m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545457
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 17:20
Start Date: 01/Feb/21 17:20
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r568000803



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
##
@@ -53,11 +54,16 @@ public MemoryInfo(Configuration conf) {
 this.maxExecutorMemory = memPerInstance / numExecutors;
   }
 } else if (isTez) {
-int containerSizeMb = StatsUtils.getAvailableMemory(conf, true);
+long containerSizeMb = 
DagUtils.getContainerResource(conf).getMemorySize();
 float heapFraction = HiveConf.getFloatVar(conf, 
HiveConf.ConfVars.TEZ_CONTAINER_MAX_JAVA_HEAP_FRACTION);
 this.maxExecutorMemory = (long) ((containerSizeMb * 1024L * 1024L) * 
heapFraction);
 } else {
-  this.maxExecutorMemory = StatsUtils.getAvailableMemory(conf, false) * 
1024L * 1024L;
+  this.maxExecutorMemory =
+  conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
MRJobConfig.DEFAULT_MAP_MEMORY_MB) * 1024L * 1024L;
+  // this can happen when config is explicitly set to "-1", in which case 
defaultValue also does not work
+  if (maxExecutorMemory < 0) {
+maxExecutorMemory = MRJobConfig.DEFAULT_MAP_MEMORY_MB * 1024L * 1024L;

Review comment:
   Log message please





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545457)
Time Spent: 1h 10m  (was: 1h)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545437
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 16:41
Start Date: 01/Feb/21 16:41
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1933:
URL: https://github.com/apache/hive/pull/1933#issuecomment-770990643


   > I think all the changes should be rolled into `getContainerResource` and 
then everything else calls that.
   
   Sure, I guess you mean always returning a Resource instance, and use memory 
or cpu wherever needed ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545437)
Time Spent: 1h  (was: 50m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545433
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 16:40
Start Date: 01/Feb/21 16:40
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r567969784



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
##
@@ -1915,15 +1915,26 @@ private static String getFullyQualifiedName(String... 
names) {
 return result;
   }
 
-  public static long getAvailableMemory(Configuration conf) {
-int memory = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVETEZCONTAINERSIZE);
-if (memory <= 0) {
-  memory = conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
MRJobConfig.DEFAULT_MAP_MEMORY_MB);
-  if (memory <= 0) {
-memory = 1024;
+  /**
+   * Get the Container Memory Size from given conf (in MB).
+   * Returns HIVETEZCONTAINERSIZE when set, otherwise falls back to 
MAP_MEMORY_MB.
+   * When MAP_MEMORY_MB is explicitly set to "-1" uses DEFAULT_MAP_MEMORY_MB 
(1024) to avoid failures.
+   * @param conf Configuration
+   * @param isTez true if in Tez mode
+   * @return Container Memory Size in MB
+   */
+  public static int getAvailableMemory(Configuration conf, boolean isTez) {
+int containerMemSizeMb = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVETEZCONTAINERSIZE);

Review comment:
   The reason is I reused the logic for MR mode: 
https://github.com/apache/hive/pull/1933/files#diff-e3956e96fab0a8e9604b0e484a2ee7db29b83a62fdae4d08de068e4712191663L67





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545433)
Time Spent: 50m  (was: 40m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545432
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 16:38
Start Date: 01/Feb/21 16:38
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1933:
URL: https://github.com/apache/hive/pull/1933#issuecomment-770988991


   I think all the changes should be rolled into `getContainerResource` and 
then everything else calls that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545432)
Time Spent: 40m  (was: 0.5h)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545431
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 16:36
Start Date: 01/Feb/21 16:36
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1933:
URL: https://github.com/apache/hive/pull/1933#discussion_r567967118



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java
##
@@ -1915,15 +1915,26 @@ private static String getFullyQualifiedName(String... 
names) {
 return result;
   }
 
-  public static long getAvailableMemory(Configuration conf) {
-int memory = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVETEZCONTAINERSIZE);
-if (memory <= 0) {
-  memory = conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
MRJobConfig.DEFAULT_MAP_MEMORY_MB);
-  if (memory <= 0) {
-memory = 1024;
+  /**
+   * Get the Container Memory Size from given conf (in MB).
+   * Returns HIVETEZCONTAINERSIZE when set, otherwise falls back to 
MAP_MEMORY_MB.
+   * When MAP_MEMORY_MB is explicitly set to "-1" uses DEFAULT_MAP_MEMORY_MB 
(1024) to avoid failures.
+   * @param conf Configuration
+   * @param isTez true if in Tez mode
+   * @return Container Memory Size in MB
+   */
+  public static int getAvailableMemory(Configuration conf, boolean isTez) {
+int containerMemSizeMb = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVETEZCONTAINERSIZE);

Review comment:
   Not sure why `isTez` was added here,... why return anything if it's not 
Tez as this method is strictly for determining Tez container size.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545431)
Time Spent: 0.5h  (was: 20m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545430
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 16:32
Start Date: 01/Feb/21 16:32
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1933:
URL: https://github.com/apache/hive/pull/1933#issuecomment-770985003


   Similar simplification can be applied in DagUtils getContainerResource method



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545430)
Time Spent: 20m  (was: 10m)

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?focusedWorklogId=545429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545429
 ]

ASF GitHub Bot logged work on HIVE-24707:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 16:30
Start Date: 01/Feb/21 16:30
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1933:
URL: https://github.com/apache/hive/pull/1933


   ### What changes were proposed in this pull request?
   Apply Sane Default for Tez Containers as Last Resort
   
   
   ### Why are the changes needed?
   If Tez Container Size is an invalid value ( <= 0 ) then it falls back onto 
the MapReduce configurations, but if the MapReduce configurations have invalid 
values ( <= 0 ), they are excepted regardless and this will cause failures down 
the road.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing tests
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545429)
Remaining Estimate: 0h
Time Spent: 10m

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24707:
--
Labels: pull-request-available  (was: )

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276424#comment-17276424
 ] 

Attila Magyar edited comment on HIVE-24715 at 2/1/21, 4:04 PM:
---

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 


was (Author: amagyar):
Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows.
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276424#comment-17276424
 ] 

Attila Magyar edited comment on HIVE-24715 at 2/1/21, 4:04 PM:
---

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 


was (Author: amagyar):
Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276424#comment-17276424
 ] 

Attila Magyar commented on HIVE-24715:
--

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows.
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24707) Apply Sane Default for Tez Containers as Last Resort

2021-02-01 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24707:
-

Assignee: Panagiotis Garefalakis

> Apply Sane Default for Tez Containers as Last Resort
> 
>
> Key: HIVE-24707
> URL: https://issues.apache.org/jira/browse/HIVE-24707
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>
> {code:java|title=DagUtils.java}
> public static Resource getContainerResource(Configuration conf) {
> int memory = HiveConf.getIntVar(conf, 
> HiveConf.ConfVars.HIVETEZCONTAINERSIZE) > 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCONTAINERSIZE) :
>   conf.getInt(MRJobConfig.MAP_MEMORY_MB, 
> MRJobConfig.DEFAULT_MAP_MEMORY_MB);
> int cpus = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) > 
> 0 ?
>   HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVETEZCPUVCORES) :
>   conf.getInt(MRJobConfig.MAP_CPU_VCORES, 
> MRJobConfig.DEFAULT_MAP_CPU_VCORES);
> return Resource.newInstance(memory, cpus);
>   }
> {code}
> If Tez Container Size or VCores is an invalid value ( <= 0 ) then it falls 
> back onto the MapReduce configurations, but if the MapReduce configurations 
> have invalid values ( <= 0 ), they are excepted regardless and this will 
> cause failures down the road.
> This code should also check the MapReduce values and fall back to MapReduce 
> default values if they are <= 0.
> Also, some logging would be nice here too, reporting about where the 
> configuration values came from.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24715:



> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24711) hive metastore memory leak

2021-02-01 Thread Karen Coppage (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276407#comment-17276407
 ] 

Karen Coppage commented on HIVE-24711:
--

Do you see an impersonation (ugi.doAs()) failure in compactor.Initiator in HMS 
logs? If so, HIVE-22700 will help.

> hive metastore memory leak
> --
>
> Key: HIVE-24711
> URL: https://issues.apache.org/jira/browse/HIVE-24711
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 3.1.0
>Reporter: LinZhongwei
>Priority: Major
>
> hdp version:3.1.5.31-1
> hive version:3.1.0.3.1.5.31-1
> hadoop version:3.1.1.3.1.5.31-1
> We find that the hive metastore has memory leak if we set 
> compactor.initiator.on to true.
> If we disable the configuration, the memory leak disappear.
> How can we resolve this problem?
> Even if we set the heap size of hive metastore to 40 GB, after 1 month the 
> hive metastore service will be down with outofmemory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24525) Invite reviewers automatically by file name patterns

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24525?focusedWorklogId=545399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545399
 ]

ASF GitHub Bot logged work on HIVE-24525:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 15:24
Start Date: 01/Feb/21 15:24
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1767:
URL: https://github.com/apache/hive/pull/1767


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545399)
Time Spent: 1h  (was: 50m)

> Invite reviewers automatically by file name patterns
> 
>
> Key: HIVE-24525
> URL: https://issues.apache.org/jira/browse/HIVE-24525
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I've wrote about this an 
> [email|http://mail-archives.apache.org/mod_mbox/hive-dev/202006.mbox/%3c324a0a23-5841-09fe-a993-1a095035e...@rxd.hu%3e]
>  a long time ago...
> it could help in keeping an eye on some specific parts...eg: thrift and 
> parser changes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24478) Subquery GroupBy with Distinct SemanticException: Invalid column reference

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24478?focusedWorklogId=545371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545371
 ]

ASF GitHub Bot logged work on HIVE-24478:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 14:48
Start Date: 01/Feb/21 14:48
Worklog Time Spent: 10m 
  Work Description: pgaref edited a comment on pull request #1732:
URL: https://github.com/apache/hive/pull/1732#issuecomment-769899486


   Hey @jcamachor @maheshk114  @kasakrisz  can you please take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545371)
Time Spent: 0.5h  (was: 20m)

> Subquery GroupBy with Distinct SemanticException: Invalid column reference
> --
>
> Key: HIVE-24478
> URL: https://issues.apache.org/jira/browse/HIVE-24478
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
> CREATE TABLE tmp_src1(
>   `npp` string,
>   `nsoc` string) stored as orc;
> INSERT INTO tmp_src1 (npp,nsoc) VALUES ('1-1000CG61', '7273111');
> SELECT `min_nsoc`
> FROM
>  (SELECT `npp`,
>  MIN(`nsoc`) AS `min_nsoc`,
>  COUNT(DISTINCT `nsoc`) AS `nb_nsoc`
>   FROM tmp_src1
>   GROUP BY `npp`) `a`
> WHERE `nb_nsoc` > 0;
> {code}
> Issue:
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
> reference 'nsoc' at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:5405)
> {code}
> Query runs fine when we include `nb_nsoc` in the Select expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545316
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 13:07
Start Date: 01/Feb/21 13:07
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r566908774



##
File path: ql/src/test/results/clientpositive/tez/orc_merge12.q.out
##
@@ -162,8 +162,8 @@ Stripe Statistics:
 Column 6: count: 9174 hasNull: true min: -16379.0 max: 9763215.5639 sum: 
5.62236530305E7
 Column 7: count: 12288 hasNull: false min: 
00020767-dd8f-4f4d-bd68-4b7be64b8e44 max: fffa3516-e219-4027-b0d3-72bb2e676c52 
sum: 442368
 Column 8: count: 12288 hasNull: false min: 
000976f7-7075-4f3f-a564-5a375fafcc101416a2b7-7f64-41b7-851f-97d15405037e max: 
fffd0642-5f01-48cd-8d97-3428faee49e9b39f2b4c-efdc-4e5f-9ab5-4aa5394cb156 sum: 
884736
-Column 9: count: 9173 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.808
-Column 10: count: 9174 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.808
+Column 9: count: 9173 hasNull: true min: 1969-12-31 15:59:30.929 max: 
1969-12-31 16:00:30.80899

Review comment:
   Yes, this is expected as we are now supporting Nanosecond precision for 
Timestamps: https://issues.apache.org/jira/browse/ORC-663





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545316)
Time Spent: 6h 10m  (was: 6h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545313
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 13:05
Start Date: 01/Feb/21 13:05
Worklog Time Spent: 10m 
  Work Description: pgaref edited a comment on pull request #1823:
URL: https://github.com/apache/hive/pull/1823#issuecomment-769755146


   > I only partially reviewed this. Will continue reviewing.
   > One question: I see we do not care about column encryption related 
arguments in multiple places. Is it because column encryption is not supported?
   
   Hey @mustafaiman good question with a complicated answer -- while creating 
this I also did some digging to find out whats supported and what not. To sum 
up my findings: 
   
   -  It looks like we are currently able to encrypt entire tables and/or data 
on hdfs using kms: HIVE-8065
   -  Support for column level encryption/decryption (passing some encryption 
setting to the Table props and let Hive take care of the rest) started more 
than a while ago as part of HIVE-6329 
   -  There was a community discussion as part of HIVE-21848 to unify 
encryption table properties (at least for ORC and Parquet) that concluded in 
the accepted options
   - However, these properties are still not propagated to the tables: 
HIVE-21849
   
   I believe part of the reason is that Hive already integrates with Apache 
Ranger that can restrict user access to particular columns and also adds 
data-masking on top.
   However, I am more than happy discussing the revival of column encryption at 
some point.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545313)
Time Spent: 6h  (was: 5h 50m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545312
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 13:03
Start Date: 01/Feb/21 13:03
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r567807669



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4509,7 +4509,7 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Minimum allocation possible from LLAP buddy allocator. Allocations 
below that are\n" +
 "padded to minimum allocation. For ORC, should generally be the same 
as the expected\n" +
 "compression buffer size, or next lowest power of 2. Must be a power 
of 2."),
-LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "16Mb", new 
SizeValidator(),
+LLAP_ALLOCATOR_MAX_ALLOC("hive.llap.io.allocator.alloc.max", "4Mb", new 
SizeValidator(),

Review comment:
   The issue here is that LLAP_ALLOCATOR_MAX_ALLOC is also used as the ORC 
Writer buffer size (thus the change).
   
   Initial buffer size check was introduced in 
[ORC-238](https://github.com/apache/orc/pull/171/files) even though it was only 
applied when buffer size was enforced from table properties. Later, on ORC-1.6 
this was enforced for the [Writer buffer size in 
general](https://github.com/apache/orc/blob/0128f817b0ab28fa2d0660737234ac966f0f5c50/java/core/src/java/org/apache/orc/impl/WriterImpl.java#L171).
   
   The max bufferSize argument can be up to 2^(3*8 - 1) -- meaning less than 
8Mb and since we enforce the size to be power of 2 the next available is 4Mb.
   
   The main question here is if there is a reason for the upper limit to be < 8 
Mb (cc @prasanthj that might know more here) -- or if we should decouple the 
two configuration (LLAP alloc and ORC Writer buffer size).
   
   I believe the best thing to do for now is open a new Ticket to track this 
(as this will either require more work on LLAP, or a new release on ORC) -- and 
I do not expect this to cause any major issues until then. @mustafaiman what do 
you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545312)
Time Spent: 5h 50m  (was: 5h 40m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22944) Upgrade to Kryo5

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22944?focusedWorklogId=545310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545310
 ]

ASF GitHub Bot logged work on HIVE-22944:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 13:01
Start Date: 01/Feb/21 13:01
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1798:
URL: https://github.com/apache/hive/pull/1798#issuecomment-770839739


   merged to master, thanks @pgaref for the review!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545310)
Time Spent: 2h 50m  (was: 2h 40m)

> Upgrade to Kryo5
> 
>
> Key: HIVE-22944
> URL: https://issues.apache.org/jira/browse/HIVE-22944
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22944.01.patch, kryo4_vs_5_benchmark.log
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Maybe we should consider upgrading to kryo5 (plan ser/deser). Not sure about 
> performance benefits, but looking at the code, e.g. FieldSerializer in Kryo5 
> seems to let us extend it easier (less private fields), which could be a 
> benefit if we want to change its behavior, e.g. defining different logic for 
> different fields of an object.
> Kryo 4 FieldSerializer: 
> https://github.com/EsotericSoftware/kryo/blob/kryo-4/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> Kryo 5 FieldSerialier: 
> https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> UPDATE: currently we are at kryo 5.0.3
> TODO: why kryo-shaded artifact has been used so far?
> https://javalibs.com/artifact/com.esotericsoftware/kryo-shaded
> "This contains the shaded reflectasm jar to prevent conflicts with other 
> versions of asm."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22944) Upgrade to Kryo5

2021-02-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-22944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-22944:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Upgrade to Kryo5
> 
>
> Key: HIVE-22944
> URL: https://issues.apache.org/jira/browse/HIVE-22944
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22944.01.patch, kryo4_vs_5_benchmark.log
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Maybe we should consider upgrading to kryo5 (plan ser/deser). Not sure about 
> performance benefits, but looking at the code, e.g. FieldSerializer in Kryo5 
> seems to let us extend it easier (less private fields), which could be a 
> benefit if we want to change its behavior, e.g. defining different logic for 
> different fields of an object.
> Kryo 4 FieldSerializer: 
> https://github.com/EsotericSoftware/kryo/blob/kryo-4/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> Kryo 5 FieldSerialier: 
> https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> UPDATE: currently we are at kryo 5.0.3
> TODO: why kryo-shaded artifact has been used so far?
> https://javalibs.com/artifact/com.esotericsoftware/kryo-shaded
> "This contains the shaded reflectasm jar to prevent conflicts with other 
> versions of asm."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22944) Upgrade to Kryo5

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22944?focusedWorklogId=545307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545307
 ]

ASF GitHub Bot logged work on HIVE-22944:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 13:01
Start Date: 01/Feb/21 13:01
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #1798:
URL: https://github.com/apache/hive/pull/1798


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545307)
Time Spent: 2h 40m  (was: 2.5h)

> Upgrade to Kryo5
> 
>
> Key: HIVE-22944
> URL: https://issues.apache.org/jira/browse/HIVE-22944
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22944.01.patch, kryo4_vs_5_benchmark.log
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Maybe we should consider upgrading to kryo5 (plan ser/deser). Not sure about 
> performance benefits, but looking at the code, e.g. FieldSerializer in Kryo5 
> seems to let us extend it easier (less private fields), which could be a 
> benefit if we want to change its behavior, e.g. defining different logic for 
> different fields of an object.
> Kryo 4 FieldSerializer: 
> https://github.com/EsotericSoftware/kryo/blob/kryo-4/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> Kryo 5 FieldSerialier: 
> https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> UPDATE: currently we are at kryo 5.0.3
> TODO: why kryo-shaded artifact has been used so far?
> https://javalibs.com/artifact/com.esotericsoftware/kryo-shaded
> "This contains the shaded reflectasm jar to prevent conflicts with other 
> versions of asm."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22944) Upgrade to Kryo5

2021-02-01 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-22944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-22944:

Fix Version/s: 4.0.0

> Upgrade to Kryo5
> 
>
> Key: HIVE-22944
> URL: https://issues.apache.org/jira/browse/HIVE-22944
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22944.01.patch, kryo4_vs_5_benchmark.log
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Maybe we should consider upgrading to kryo5 (plan ser/deser). Not sure about 
> performance benefits, but looking at the code, e.g. FieldSerializer in Kryo5 
> seems to let us extend it easier (less private fields), which could be a 
> benefit if we want to change its behavior, e.g. defining different logic for 
> different fields of an object.
> Kryo 4 FieldSerializer: 
> https://github.com/EsotericSoftware/kryo/blob/kryo-4/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> Kryo 5 FieldSerialier: 
> https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> UPDATE: currently we are at kryo 5.0.3
> TODO: why kryo-shaded artifact has been used so far?
> https://javalibs.com/artifact/com.esotericsoftware/kryo-shaded
> "This contains the shaded reflectasm jar to prevent conflicts with other 
> versions of asm."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545304
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 12:51
Start Date: 01/Feb/21 12:51
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r567800512



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/LocalCache.java
##
@@ -82,8 +82,7 @@ public void put(Path path, OrcTail tail) {
 if (bb.capacity() != bb.remaining()) {
   throw new RuntimeException("Bytebuffer allocated for path: " + path + " 
has remaining: " + bb.remaining() + " != capacity: " + bb.capacity());
 }
-cache.put(path, new TailAndFileData(tail.getFileTail().getFileLength(),
-tail.getFileModificationTime(), bb.duplicate()));
+cache.put(path, new TailAndFileData(bb.limit(), 
tail.getFileModificationTime(), bb.duplicate()));

Review comment:
   But I agree, cache should be populated with the original 
**getFileTail().getFileLength()**  as it is afterward used for comparison (thus 
reverted this change) -- however, where ReaderImpl.extractFileTail is now 
called uses the buffer size instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545304)
Time Spent: 5h 40m  (was: 5.5h)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.7

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=545302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545302
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 12:48
Start Date: 01/Feb/21 12:48
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r567799059



##
File path: 
ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_multicol.q.out
##
@@ -355,7 +355,7 @@ Stage-1 HIVE COUNTERS:
RECORDS_OUT_OPERATOR_TS_3: 800
TOTAL_TABLE_ROWS_WRITTEN: 0
 Stage-1 LLAP IO COUNTERS:
-   CACHE_HIT_BYTES: 138344
+   CACHE_MISS_BYTES: 138342

Review comment:
   This was a bit more complex, CacheWriter.getSparseOrcIndexFromDenseDest 
was called with colId = 0 from SerDeEncodedDataReader -- causing 
IndexOutOfBounds and Cache not being populated.
   
   This is now addressed by 
https://github.com/apache/hive/pull/1823/commits/da1aa077716a65c2a02d850828b16cdeece1f574





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545302)
Time Spent: 5.5h  (was: 5h 20m)

> Upgrade ORC version to 1.6.7
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24653) Race condition between compactor marker generation and get splits

2021-02-01 Thread Antal Sinkovits (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits resolved HIVE-24653.

Fix Version/s: 3.1.3
   Resolution: Fixed

> Race condition between compactor marker generation and get splits
> -
>
> Key: HIVE-24653
> URL: https://issues.apache.org/jira/browse/HIVE-24653
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.1.3
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In a rear scenario it's possible that the compactor moved the files in the 
> final location before creating the compactor marker, so it can be fetched by 
> get splits before the marker is created.
> 2020-09-14 04:55:25,978 [ERROR] ORC_GET_SPLITS #4 |io.AcidUtils|: Failed to 
> read 
> hdfs://host/warehouse/tablespace/managed/hive/database.db/table/partition=x/base_0011535/_metadata_acid:
>  No content to map to Object due to end of input
> java.io.EOFException: No content to map to Object due to end of input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24706) Spark SQL access hive on HBase table access exception

2021-02-01 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276267#comment-17276267
 ] 

Zoltan Haindrich commented on HIVE-24706:
-

I see that you have some issue with Spark - but I don't fully understand what 
would help your issue.
Please feel free to submit a PR if you would like to change the HBaseHandler 

> Spark SQL access hive on HBase table access exception
> -
>
> Key: HIVE-24706
> URL: https://issues.apache.org/jira/browse/HIVE-24706
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: zhangzhanchang
>Priority: Major
> Attachments: image-2021-01-30-15-51-58-665.png
>
>
> Hivehbasetableinputformat relies on two versions of inputformat，one is 
> org.apache.hadoop.mapred.InputFormat, the other is 
> org.apache.hadoop.mapreduce.InputFormat,Causes
> spark 3.0(https://github.com/apache/spark/pull/31302) both conditions to be 
> true:
>  # classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
>  # classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
> !image-2021-01-30-15-51-58-665.png|width=430,height=137!
> Hivehbasetableinputformat relies on inputformat to be changed to 
> org.apache.hadoop.mapreduce or org.apache.hadoop.mapred?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23485) Bound GroupByOperator stats using largest NDV among columns

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23485?focusedWorklogId=545251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545251
 ]

ASF GitHub Bot logged work on HIVE-23485:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 10:40
Start Date: 01/Feb/21 10:40
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1926:
URL: https://github.com/apache/hive/pull/1926#issuecomment-770757613


   yeah; I also tried to not forgot it :D
   I hope the changes are still good :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545251)
Time Spent: 50m  (was: 40m)

> Bound GroupByOperator stats using largest NDV among columns
> ---
>
> Key: HIVE-23485
> URL: https://issues.apache.org/jira/browse/HIVE-23485
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23485.01.patch, HIVE-23485.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Consider the following SQL query:
> {code:sql}
> select id, name from person group by id, name;
> {code}
> and assume that the person table contains the following tuples:
> {code:sql}
> insert into person values (0, 'A') ;
> insert into person values (1, 'A') ;
> insert into person values (2, 'B') ;
> insert into person values (3, 'B') ;
> insert into person values (4, 'B') ;
> insert into person values (5, 'C') ;
> {code}
> If we know the number of distinct values (NDV) for all columns in the group 
> by clause then we can infer a lower bound for the total number of rows by 
> taking the maximun NDV of the involved columns. 
> Currently the query in the scenario above has the following plan:
> {noformat}
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_11]
> Group By Operator [GBY_10] (rows=3 width=92)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_9]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_8] (rows=3 width=92)
>   Output:["_col0","_col1"],keys:id, name
>   Select Operator [SEL_7] (rows=6 width=92)
> Output:["id","name"]
> TableScan [TS_0] (rows=6 width=92)
>   
> default@person,person,Tbl:COMPLETE,Col:COMPLETE,Output:["id","name"]{noformat}
> Observe that the stats for group by report 3 rows but given that the ID 
> attribute is part of the aggregation the rows cannot be less than 6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23485) Bound GroupByOperator stats using largest NDV among columns

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23485?focusedWorklogId=545250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545250
 ]

ASF GitHub Bot logged work on HIVE-23485:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 10:38
Start Date: 01/Feb/21 10:38
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1926:
URL: https://github.com/apache/hive/pull/1926


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545250)
Time Spent: 40m  (was: 0.5h)

> Bound GroupByOperator stats using largest NDV among columns
> ---
>
> Key: HIVE-23485
> URL: https://issues.apache.org/jira/browse/HIVE-23485
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23485.01.patch, HIVE-23485.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Consider the following SQL query:
> {code:sql}
> select id, name from person group by id, name;
> {code}
> and assume that the person table contains the following tuples:
> {code:sql}
> insert into person values (0, 'A') ;
> insert into person values (1, 'A') ;
> insert into person values (2, 'B') ;
> insert into person values (3, 'B') ;
> insert into person values (4, 'B') ;
> insert into person values (5, 'C') ;
> {code}
> If we know the number of distinct values (NDV) for all columns in the group 
> by clause then we can infer a lower bound for the total number of rows by 
> taking the maximun NDV of the involved columns. 
> Currently the query in the scenario above has the following plan:
> {noformat}
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_11]
> Group By Operator [GBY_10] (rows=3 width=92)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_9]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_8] (rows=3 width=92)
>   Output:["_col0","_col1"],keys:id, name
>   Select Operator [SEL_7] (rows=6 width=92)
> Output:["id","name"]
> TableScan [TS_0] (rows=6 width=92)
>   
> default@person,person,Tbl:COMPLETE,Col:COMPLETE,Output:["id","name"]{noformat}
> Observe that the stats for group by report 3 rows but given that the ID 
> attribute is part of the aggregation the rows cannot be less than 6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24713) HS2 never knows deregistering from Zookeeper in the particular case

2021-02-01 Thread Eugene Chung (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-24713:

Description: 
While using zookeeper discovery mode, the problem that HS2 never knows 
deregistering from Zookeeper always happens.

Reproduction is simple.
 # Find one of the zk servers which holds the DeRegisterWatcher watches of HS2 
instances. If the version of ZK server is 3.5.0 or above, it's easily found 
with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
 # Check which HS2 instance is watching on the ZK server found at 1, say it's 
_hs2-of-2_
 # Restart the ZK server found at 1
 # Deregister _hs2-of-2_ with the command
{noformat}
hive --service hiveserver2 -deregister hs2-of-2{noformat}

 # _hs2-of-2_ never knows that it must be shut down because the watch event of 
DeregisterWatcher was already fired at the time of 3.

The reason of the problem is explained at 
[https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]

I added some logging to DeRegisterWatcher and checked what events were occurred 
at the time of 3(restarting of ZK server);
 # WatchedEvent state:Disconnected type:None path:null
 # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
 # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
 # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
 path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]

As the zk manual says, watches are one-time triggers. When the connection to 
the ZK server was reestablished, state:SyncConnected type:NodeDataChanged for 
the path is fired and it's the end. *DeregisterWatcher must be registered again 
for the same znode to get a future NodeDeleted event.*

  was:
While using zookeeper discovery mode, the problem that HS2 never knows 
deregistering from Zookeeper could always happen.

Reproduction is simple.
 # Find one of the zk servers which holds the DeRegisterWatcher watches of HS2 
instances. If the version of ZK server is 3.5.0 or above, it's easily found 
with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
 # Check which HS2 instance is watching on the ZK server found at 1, say it's 
_hs2-of-2_
 # Restart the ZK server found at 1
 # Deregister _hs2-of-2_ with the command
{noformat}
hive --service hiveserver2 -deregister hs2-of-2{noformat}

 # _hs2-of-2_ never knows that it must be shut down because the watch event of 
DeregisterWatcher was already fired at the time of 3.

The reason of the problem is explained at 
[https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]

I added some logging to DeRegisterWatcher and checked what events were occurred 
at the time of 3(restarting of ZK server);
 # WatchedEvent state:Disconnected type:None path:null
 # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
 # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
 # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
 path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]

As the zk manual says, watches are one-time triggers. When the connection to 
the ZK server was reestablished, state:SyncConnected type:NodeDataChanged for 
the path is fired and it's the end. *DeregisterWatcher must be registered again 
for the same znode to get a future NodeDeleted event.*


> HS2 never knows deregistering from Zookeeper in the particular case
> ---
>
> Key: HIVE-24713
> URL: https://issues.apache.org/jira/browse/HIVE-24713
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While using zookeeper discovery mode, the problem that HS2 never knows 
> deregistering from Zookeeper always happens.
> Reproduction is simple.
>  # Find one of the zk servers which holds the DeRegisterWatcher watches of 
> HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily 
> found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
>  # Check which HS2 instance is watching on the ZK server found at 1, say it's 
> _hs2-of-2_
>  # Restart the ZK server found at 1
>  # Deregister _hs2-of-2_ with the command
> {noformat}
> hive --service hiveserver2 -deregister hs2-of-2{noformat}
>  # _hs2-of-2_ never knows that it must be shut down because the watch event 
> of DeregisterWatcher was already fired at the time of 3.
> The reason of the problem is explained at 
> [https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]
> I

[jira] [Resolved] (HIVE-24503) Optimize vector row serde by avoiding type check at run time

2021-02-01 Thread mahesh kumar behera (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-24503.

Resolution: Fixed

> Optimize vector row serde by avoiding type check at run time 
> -
>
> Key: HIVE-24503
> URL: https://issues.apache.org/jira/browse/HIVE-24503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Serialization/Deserialization of vectorized batch done at VectorSerializeRow 
> and VectorDeserializeRow does a type checking for each column of each row. 
> This becomes very costly when there are billions of rows to read/write. This 
> can be optimized if the type check is done during init time and specific 
> reader/writer classes are created. This classes can be used directly stored 
> in filed structure to avoid run time type check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24589) Drop catalog failing with deadlock error for Oracle backend dbms.

2021-02-01 Thread mahesh kumar behera (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera resolved HIVE-24589.

Resolution: Fixed

> Drop catalog failing with deadlock error for Oracle backend dbms.
> -
>
> Key: HIVE-24589
> URL: https://issues.apache.org/jira/browse/HIVE-24589
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When we do a drop catalog we drop the catalog from the CTLGS table. The DBS 
> table has a foreign key reference on CTLGS for CTLG_NAME. This is causing the 
> DBS table to be locked exclusively and causing deadlocks. This can be avoided 
> by creating an index in the DBS table on CTLG_NAME.
> {code:java}
> CREATE INDEX CTLG_NAME_DBS ON DBS(CTLG_NAME); {code}
> {code:java}
>  Oracle Database maximizes the concurrency control of parent keys in relation 
> to dependent foreign keys.Locking behaviour depends on whether foreign key 
> columns are indexed. If foreign keys are not indexed, then the child table 
> will probably be locked more frequently, deadlocks will occur, and 
> concurrency will be decreased. For this reason foreign keys should almost 
> always be indexed. The only exception is when the matching unique or primary 
> key is never updated or deleted.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24653) Race condition between compactor marker generation and get splits

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24653?focusedWorklogId=545235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545235
 ]

ASF GitHub Bot logged work on HIVE-24653:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 10:12
Start Date: 01/Feb/21 10:12
Worklog Time Spent: 10m 
  Work Description: lcspinter merged pull request #1882:
URL: https://github.com/apache/hive/pull/1882


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545235)
Time Spent: 1h  (was: 50m)

> Race condition between compactor marker generation and get splits
> -
>
> Key: HIVE-24653
> URL: https://issues.apache.org/jira/browse/HIVE-24653
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In a rear scenario it's possible that the compactor moved the files in the 
> final location before creating the compactor marker, so it can be fetched by 
> get splits before the marker is created.
> 2020-09-14 04:55:25,978 [ERROR] ORC_GET_SPLITS #4 |io.AcidUtils|: Failed to 
> read 
> hdfs://host/warehouse/tablespace/managed/hive/database.db/table/partition=x/base_0011535/_metadata_acid:
>  No content to map to Object due to end of input
> java.io.EOFException: No content to map to Object due to end of input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24653) Race condition between compactor marker generation and get splits

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24653?focusedWorklogId=545233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545233
 ]

ASF GitHub Bot logged work on HIVE-24653:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 10:11
Start Date: 01/Feb/21 10:11
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on pull request #1882:
URL: https://github.com/apache/hive/pull/1882#issuecomment-770739641


   The tests are flaky on this branch and the failures are not related to this 
change.
   Thanks for the patch @asinkovits!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545233)
Time Spent: 50m  (was: 40m)

> Race condition between compactor marker generation and get splits
> -
>
> Key: HIVE-24653
> URL: https://issues.apache.org/jira/browse/HIVE-24653
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In a rear scenario it's possible that the compactor moved the files in the 
> final location before creating the compactor marker, so it can be fetched by 
> get splits before the marker is created.
> 2020-09-14 04:55:25,978 [ERROR] ORC_GET_SPLITS #4 |io.AcidUtils|: Failed to 
> read 
> hdfs://host/warehouse/tablespace/managed/hive/database.db/table/partition=x/base_0011535/_metadata_acid:
>  No content to map to Object due to end of input
> java.io.EOFException: No content to map to Object due to end of input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-02-01 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276212#comment-17276212
 ] 

Syed Shameerur Rahman commented on HIVE-24584:
--

Hi, [~amagyar]

Unfortunately i am not able to reproduce this. Here is my setup

Hive Verison - 3.1.2 , I tried this in non-embedded mode i.e HMS (HMS running 
in different JVM from HS2) with SQL DB hosted locally (Running on the same node 
where my HS2 and HMS server are running). 

Am i missing anything?


> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24713) HS2 never knows deregistering from Zookeeper in the particular case

2021-02-01 Thread Eugene Chung (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-24713:

Description: 
While using zookeeper discovery mode, the problem that HS2 never knows 
deregistering from Zookeeper could always happen.

Reproduction is simple.
 # Find one of the zk servers which holds the DeRegisterWatcher watches of HS2 
instances. If the version of ZK server is 3.5.0 or above, it's easily found 
with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
 # Check which HS2 instance is watching on the ZK server found at 1, say it's 
_hs2-of-2_
 # Restart the ZK server found at 1
 # Deregister _hs2-of-2_ with the command
{noformat}
hive --service hiveserver2 -deregister hs2-of-2{noformat}

 # _hs2-of-2_ never knows that it must be shut down because the watch event of 
DeregisterWatcher was already fired at the time of 3.

The reason of the problem is explained at 
[https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]

I added some logging to DeRegisterWatcher and checked what events were occurred 
at the time of 3(restarting of ZK server);
 # WatchedEvent state:Disconnected type:None path:null
 # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
 # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
 # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
 path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]

As the zk manual says, watches are one-time triggers. When the connection to 
the ZK server was reestablished, state:SyncConnected type:NodeDataChanged for 
the path is fired and it's the end. *DeregisterWatcher must be registered again 
for the same znode to get a future NodeDeleted event.*

  was:
While using zookeeper discovery mode, the problem that HS2 never knows 
deregistering from Zookeeper could always happen.

Reproduction is simple.
 # Find one of the zk servers which holds the DeRegisterWatcher watches of HS2 
instances. If the version of ZK server is 3.5.0 or above, it's easily found 
with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
 # Check which HS2 instance is watching on the ZK server found at 1, say it's 
_hs2-of-2_
 # Restart the ZK server found at 1
 # Deregister _hs2-of-2_ with the command
{noformat}
hive --service hiveserver2 -deregister hs2-of-2{noformat}

 # _hs2-of-2_ never knows that it must be shut down because the watch event of 
DeregisterWatcher was already fired at the time of 3.

The reason of the problem is explained at 
[https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]

I added some logging to DeRegisterWatcher and checked what events were occurred 
at the time of 3(restarting of ZK server);
 # WatchedEvent state:Disconnected type:None path:null
 # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
 # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
 # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
 path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]

As the zk manual says, watches are one-time triggers. When the connection to 
the ZK server was reestablished, state:SyncConnected 
type:NodeDataChanged,path:hs2-of-2 is fired and it's the end. 
*DeregisterWatcher must be registered again for the same znode to get a future 
NodeDeleted event.*


> HS2 never knows deregistering from Zookeeper in the particular case
> ---
>
> Key: HIVE-24713
> URL: https://issues.apache.org/jira/browse/HIVE-24713
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While using zookeeper discovery mode, the problem that HS2 never knows 
> deregistering from Zookeeper could always happen.
> Reproduction is simple.
>  # Find one of the zk servers which holds the DeRegisterWatcher watches of 
> HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily 
> found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
>  # Check which HS2 instance is watching on the ZK server found at 1, say it's 
> _hs2-of-2_
>  # Restart the ZK server found at 1
>  # Deregister _hs2-of-2_ with the command
> {noformat}
> hive --service hiveserver2 -deregister hs2-of-2{noformat}
>  # _hs2-of-2_ never knows that it must be shut down because the watch event 
> of DeregisterWatcher was already fired at the time of 3.
> The reason of the problem is explained at 
>

[jira] [Updated] (HIVE-24713) HS2 never knows deregistering from Zookeeper in the particular case

2021-02-01 Thread Eugene Chung (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-24713:

Description: 
While using zookeeper discovery mode, the problem that HS2 never knows 
deregistering from Zookeeper could always happen.

Reproduction is simple.
 # Find one of the zk servers which holds the DeRegisterWatcher watches of HS2 
instances. If the version of ZK server is 3.5.0 or above, it's easily found 
with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
 # Check which HS2 instance is watching on the ZK server found at 1, say it's 
_hs2-of-2_
 # Restart the ZK server found at 1
 # Deregister _hs2-of-2_ with the command
{noformat}
hive --service hiveserver2 -deregister hs2-of-2{noformat}

 # _hs2-of-2_ never knows that it must be shut down because the watch event of 
DeregisterWatcher was already fired at the time of 3.

The reason of the problem is explained at 
[https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]

I added some logging to DeRegisterWatcher and checked what events were occurred 
at the time of 3(restarting of ZK server);
 # WatchedEvent state:Disconnected type:None path:null
 # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
 # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
 # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
 path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]

As the zk manual says, watches are one-time triggers. When the connection to 
the ZK server was reestablished, state:SyncConnected 
type:NodeDataChanged,path:hs2-of-2 is fired and it's the end. 
*DeregisterWatcher must be registered again for the same znode to get a future 
NodeDeleted event.*

  was:
While using zookeeper discovery mode, the problem that HS2 never knows 
deregistering from Zookeeper could always happen.

Reproduction is simple.
 # Find one of the zk servers which holds the DeRegisterWatcher watches of HS2 
instances. If the version of ZK server is 3.5.0 or above, it's easily found 
with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
 # Check which HS2 instance is watching on the ZK server found at 1, say it's 
_hs2-of-2_
 # Restart the ZK server found at 1
 # Deregister _hs2-of-2_ with the command
{noformat}
hive --service hiveserver2 -deregister hs2-of-2{noformat}

 # _hs2-of-2_ never knows that it must be shut down because the watch event of 
DeregisterWatcher was already fired at the time of 3.

The reason of the problem is explained at 
[https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]

I added some logging to DeRegisterWatcher and checked what events were occurred 
at the time of 3(restarting of ZK server);
 # WatchedEvent state:Disconnected type:None path:null
 # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
 # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
 # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
 path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]

As the zk manual says, watches are one-time triggers. When the connection to 
the ZK server was reestablished, state:SyncConnected 
type:NodeDataChanged,path:hs2-of-2 was fired and it's the end. 
*DeregisterWatcher must be registered again for the same znode to get a future 
NodeDeleted event.*


> HS2 never knows deregistering from Zookeeper in the particular case
> ---
>
> Key: HIVE-24713
> URL: https://issues.apache.org/jira/browse/HIVE-24713
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While using zookeeper discovery mode, the problem that HS2 never knows 
> deregistering from Zookeeper could always happen.
> Reproduction is simple.
>  # Find one of the zk servers which holds the DeRegisterWatcher watches of 
> HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily 
> found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
>  # Check which HS2 instance is watching on the ZK server found at 1, say it's 
> _hs2-of-2_
>  # Restart the ZK server found at 1
>  # Deregister _hs2-of-2_ with the command
> {noformat}
> hive --service hiveserver2 -deregister hs2-of-2{noformat}
>  # _hs2-of-2_ never knows that it must be shut down because the watch event 
> of DeregisterWatcher was already fired at the time of 3.
> The reason of the problem is explained at 
>

[jira] [Work logged] (HIVE-24221) Use vectorizable expression to combine multiple columns in semijoin bloom filters

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24221?focusedWorklogId=545219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545219
 ]

ASF GitHub Bot logged work on HIVE-24221:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 09:50
Start Date: 01/Feb/21 09:50
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #1544:
URL: https://github.com/apache/hive/pull/1544


   
   ### What changes were proposed in this pull request?
   
   Use hash(hash(hash(a,b),c),d) instead of hash(a,b,c,d) when constructing
   the multi-col semijoin reducer.
   
   ### Why are the changes needed?
   In order to use fully vectorized execution on multi-col semijoin reducers.
   
   ### Does this PR introduce _any_ user-facing change?
   Only changes in EXPLAIN plans.
   
   ### How was this patch tested?
   `mvn test -Dtest=TestTezPerfCliDriver -Dqfile="query50.q"`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545219)
Time Spent: 1h 50m  (was: 1h 40m)

> Use vectorizable expression to combine multiple columns in semijoin bloom 
> filters
> -
>
> Key: HIVE-24221
> URL: https://issues.apache.org/jira/browse/HIVE-24221
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
> Environment: 
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, multi-column semijoin reducers use an n-ary call to 
> GenericUDFMurmurHash to combine multiple values into one, which is used as an 
> entry to the bloom filter. However, there are no vectorized operators that 
> treat n-ary inputs. The same goes for the vectorized implementation of 
> GenericUDFMurmurHash introduced in HIVE-23976. 
> The goal of this issue is to choose an alternative way to combine multiple 
> values into one to pass in the bloom filter comprising only vectorized 
> operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24221) Use vectorizable expression to combine multiple columns in semijoin bloom filters

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24221?focusedWorklogId=545218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545218
 ]

ASF GitHub Bot logged work on HIVE-24221:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 09:50
Start Date: 01/Feb/21 09:50
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #1544:
URL: https://github.com/apache/hive/pull/1544


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545218)
Time Spent: 1h 40m  (was: 1.5h)

> Use vectorizable expression to combine multiple columns in semijoin bloom 
> filters
> -
>
> Key: HIVE-24221
> URL: https://issues.apache.org/jira/browse/HIVE-24221
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
> Environment: 
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, multi-column semijoin reducers use an n-ary call to 
> GenericUDFMurmurHash to combine multiple values into one, which is used as an 
> entry to the bloom filter. However, there are no vectorized operators that 
> treat n-ary inputs. The same goes for the vectorized implementation of 
> GenericUDFMurmurHash introduced in HIVE-23976. 
> The goal of this issue is to choose an alternative way to combine multiple 
> values into one to pass in the bloom filter comprising only vectorized 
> operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24221) Use vectorizable expression to combine multiple columns in semijoin bloom filters

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24221?focusedWorklogId=545217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545217
 ]

ASF GitHub Bot logged work on HIVE-24221:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 09:49
Start Date: 01/Feb/21 09:49
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1544:
URL: https://github.com/apache/hive/pull/1544#issuecomment-770725925


   Close/Reopen to trigger tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545217)
Time Spent: 1.5h  (was: 1h 20m)

> Use vectorizable expression to combine multiple columns in semijoin bloom 
> filters
> -
>
> Key: HIVE-24221
> URL: https://issues.apache.org/jira/browse/HIVE-24221
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
> Environment: 
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, multi-column semijoin reducers use an n-ary call to 
> GenericUDFMurmurHash to combine multiple values into one, which is used as an 
> entry to the bloom filter. However, there are no vectorized operators that 
> treat n-ary inputs. The same goes for the vectorized implementation of 
> GenericUDFMurmurHash introduced in HIVE-23976. 
> The goal of this issue is to choose an alternative way to combine multiple 
> values into one to pass in the bloom filter comprising only vectorized 
> operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23485) Bound GroupByOperator stats using largest NDV among columns

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23485?focusedWorklogId=545215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545215
 ]

ASF GitHub Bot logged work on HIVE-23485:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 09:48
Start Date: 01/Feb/21 09:48
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1926:
URL: https://github.com/apache/hive/pull/1926#issuecomment-770725017


   Hey @kgyrtkirk , tests are green so it would be nice to get this in before 
conflicts start to emerge again :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545215)
Time Spent: 0.5h  (was: 20m)

> Bound GroupByOperator stats using largest NDV among columns
> ---
>
> Key: HIVE-23485
> URL: https://issues.apache.org/jira/browse/HIVE-23485
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23485.01.patch, HIVE-23485.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Consider the following SQL query:
> {code:sql}
> select id, name from person group by id, name;
> {code}
> and assume that the person table contains the following tuples:
> {code:sql}
> insert into person values (0, 'A') ;
> insert into person values (1, 'A') ;
> insert into person values (2, 'B') ;
> insert into person values (3, 'B') ;
> insert into person values (4, 'B') ;
> insert into person values (5, 'C') ;
> {code}
> If we know the number of distinct values (NDV) for all columns in the group 
> by clause then we can infer a lower bound for the total number of rows by 
> taking the maximun NDV of the involved columns. 
> Currently the query in the scenario above has the following plan:
> {noformat}
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_11]
> Group By Operator [GBY_10] (rows=3 width=92)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_9]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_8] (rows=3 width=92)
>   Output:["_col0","_col1"],keys:id, name
>   Select Operator [SEL_7] (rows=6 width=92)
> Output:["id","name"]
> TableScan [TS_0] (rows=6 width=92)
>   
> default@person,person,Tbl:COMPLETE,Col:COMPLETE,Output:["id","name"]{noformat}
> Observe that the stats for group by report 3 rows but given that the ID 
> attribute is part of the aggregation the rows cannot be less than 6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24712) hive.map.aggr=false and hive.optimize.reducededuplication=false provide incorrect result on order by with limit

2021-02-01 Thread liuyan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuyan updated HIVE-24712:
--
Description: 
 When Both param set to false , seems the result is not correct, only 35 rows. 
This is tested on HDP 3.1.5


--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 ..  llap SUCCEEDED 33 3300
   0   0  
Reducer 2 ..  llap SUCCEEDED  4  400
   0   0  
Reducer 3 ..  llap SUCCEEDED  4  400
   0   0  
Reducer 4 ..  llap SUCCEEDED  1  100
   0   0  
--
VERTICES: 04/04  [==>>] 100%  ELAPSED TIME: 38.23 s
--
FO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :  Map 1  38097.00  0  0 
143,997,065   57,447
INFO  :  Reducer 2   9003.00  0  0  
57,447   13,108
INFO  :  Reducer 3  0.00  0  0  
13,108   35
INFO  :  Reducer 4  0.00  0  0  
350
INFO  : 
--
INFO  : 
INFO  : LLAP IO Summary


 

 

set hive.map.aggr=true;
set hive.optimize.reducededuplication=false;

select cs_sold_date_sk,count(distinct cs_order_number) from 
tpcds_orc.catalog_sales_orc group by cs_sold_date_sk  order by cs_sold_date_sk 
limit 200;
--
VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 ..  llap SUCCEEDED 33 3300
   0   0  
Reducer 2 ..  llap SUCCEEDED  4  400
   0   0  
Reducer 3 ..  llap SUCCEEDED  2  200
   0   0  
Reducer 4 ..  llap SUCCEEDED  1  100
   0   0  
--
VERTICES: 04/04  [==>>] 100%  ELAPSED TIME: 36.24 s
--


INFO  : 
--
INFO  :   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :  Map 1  25595.00  0  0 
143,997,065   16,703,757
INFO  :  Reducer 2  18556.00  0  0  
16,703,757  800
INFO  :  Reducer 3   8018.00  0  0 
800  200
INFO  :  Reducer 4  0.00  0  0 
2000
INFO  : 
--
INFO  : 

  was:
 When Both param set to false , seems the result is not correct, only 35 rows. 
This is tested on HDP 3.1.5

set hive.map.aggr=false;
set hive.optimize.reducededuplication=false;

select cs_sold_date_sk,count(distinct cs_order_number) from 
tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk 
limit 200;

--
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
--
Map 1 .. llap SUCCEEDED 33 33 0 0 0 0 
Reducer 2 .. llap SUCCEEDED 4 4 0 0 0 0 
Reducer 3 .. llap SUCCEEDED 4 4 0 0 0 0 
Reducer 4 .. llap SUCCEEDED 1 1 0 0 0 0

[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-02-01 Thread Attila Magyar (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276184#comment-17276184
 ] 

Attila Magyar commented on HIVE-24584:
--

Hi [~srahman], did you manage to reproduce it?

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24713) HS2 never knows deregistering from Zookeeper in the particular case

2021-02-01 Thread Eugene Chung (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-24713:

Summary: HS2 never knows deregistering from Zookeeper in the particular 
case  (was: HS2 never knows the deregistering from Zookeeper in the particular 
case)

> HS2 never knows deregistering from Zookeeper in the particular case
> ---
>
> Key: HIVE-24713
> URL: https://issues.apache.org/jira/browse/HIVE-24713
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While using zookeeper discovery mode, the problem that HS2 never knows 
> deregistering from Zookeeper could always happen.
> Reproduction is simple.
>  # Find one of the zk servers which holds the DeRegisterWatcher watches of 
> HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily 
> found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
>  # Check which HS2 instance is watching on the ZK server found at 1, say it's 
> _hs2-of-2_
>  # Restart the ZK server found at 1
>  # Deregister _hs2-of-2_ with the command
> {noformat}
> hive --service hiveserver2 -deregister hs2-of-2{noformat}
>  # _hs2-of-2_ never knows that it must be shut down because the watch event 
> of DeregisterWatcher was already fired at the time of 3.
> The reason of the problem is explained at 
> [https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]
> I added some logging to DeRegisterWatcher and checked what events were 
> occurred at the time of 3(restarting of ZK server);
>  # WatchedEvent state:Disconnected type:None path:null
>  # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
>  # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
>  # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
>  path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]
> As the zk manual says, watches are one-time triggers. When the connection to 
> the ZK server was reestablished, state:SyncConnected 
> type:NodeDataChanged,path:hs2-of-2 was fired and it's the end. 
> *DeregisterWatcher must be registered again for the same znode to get a 
> future NodeDeleted event.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24713) HS2 never knows the deregistering from Zookeeper in the particular case

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24713:
--
Labels: pull-request-available  (was: )

> HS2 never knows the deregistering from Zookeeper in the particular case
> ---
>
> Key: HIVE-24713
> URL: https://issues.apache.org/jira/browse/HIVE-24713
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While using zookeeper discovery mode, the problem that HS2 never knows 
> deregistering from Zookeeper could always happen.
> Reproduction is simple.
>  # Find one of the zk servers which holds the DeRegisterWatcher watches of 
> HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily 
> found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
>  # Check which HS2 instance is watching on the ZK server found at 1, say it's 
> _hs2-of-2_
>  # Restart the ZK server found at 1
>  # Deregister _hs2-of-2_ with the command
> {noformat}
> hive --service hiveserver2 -deregister hs2-of-2{noformat}
>  # _hs2-of-2_ never knows that it must be shut down because the watch event 
> of DeregisterWatcher was already fired at the time of 3.
> The reason of the problem is explained at 
> [https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]
> I added some logging to DeRegisterWatcher and checked what events were 
> occurred at the time of 3(restarting of ZK server);
>  # WatchedEvent state:Disconnected type:None path:null
>  # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
>  # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
>  # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
>  path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]
> As the zk manual says, watches are one-time triggers. When the connection to 
> the ZK server was reestablished, state:SyncConnected 
> type:NodeDataChanged,path:hs2-of-2 was fired and it's the end. 
> *DeregisterWatcher must be registered again for the same znode to get a 
> future NodeDeleted event.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24713) HS2 never knows the deregistering from Zookeeper in the particular case

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24713?focusedWorklogId=545193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545193
 ]

ASF GitHub Bot logged work on HIVE-24713:
-

Author: ASF GitHub Bot
Created on: 01/Feb/21 09:26
Start Date: 01/Feb/21 09:26
Worklog Time Spent: 10m 
  Work Description: EugeneChung opened a new pull request #1932:
URL: https://github.com/apache/hive/pull/1932


   
   
   ### What changes were proposed in this pull request?
   
   
   Register ZKDeRegisterWatcher again for NodeDataChanged event after ZK 
connection reestablishment
   
   ### Why are the changes needed?
   
   
   It's a bug. After ZK connection is reestablished, NodeDataChanged event for 
the server node is occurred. Then it is never notified again. HS2 eventually 
never knows deregistering from Zookeeper.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   On my company's environment. I described reproduction process of the problem 
on the ticket.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545193)
Remaining Estimate: 0h
Time Spent: 10m

> HS2 never knows the deregistering from Zookeeper in the particular case
> ---
>
> Key: HIVE-24713
> URL: https://issues.apache.org/jira/browse/HIVE-24713
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While using zookeeper discovery mode, the problem that HS2 never knows 
> deregistering from Zookeeper could always happen.
> Reproduction is simple.
>  # Find one of the zk servers which holds the DeRegisterWatcher watches of 
> HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily 
> found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
>  # Check which HS2 instance is watching on the ZK server found at 1, say it's 
> _hs2-of-2_
>  # Restart the ZK server found at 1
>  # Deregister _hs2-of-2_ with the command
> {noformat}
> hive --service hiveserver2 -deregister hs2-of-2{noformat}
>  # _hs2-of-2_ never knows that it must be shut down because the watch event 
> of DeregisterWatcher was already fired at the time of 3.
> The reason of the problem is explained at 
> [https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]
> I added some logging to DeRegisterWatcher and checked what events were 
> occurred at the time of 3(restarting of ZK server);
>  # WatchedEvent state:Disconnected type:None path:null
>  # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
>  # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
>  # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
>  path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]
> As the zk manual says, watches are one-time triggers. When the connection to 
> the ZK server was reestablished, state:SyncConnected 
> type:NodeDataChanged,path:hs2-of-2 was fired and it's the end. 
> *DeregisterWatcher must be registered again for the same znode to get a 
> future NodeDeleted event.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24713) HS2 never knows the deregistering from Zookeeper in the particular case

2021-02-01 Thread Eugene Chung (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-24713:

Summary: HS2 never knows the deregistering from Zookeeper in the particular 
case  (was: HS2 never knows the deletion of znode in the particular case)

> HS2 never knows the deregistering from Zookeeper in the particular case
> ---
>
> Key: HIVE-24713
> URL: https://issues.apache.org/jira/browse/HIVE-24713
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
>
> While using zookeeper discovery mode, the problem that HS2 never knows 
> deregistering from Zookeeper could always happen.
> Reproduction is simple.
>  # Find one of the zk servers which holds the DeRegisterWatcher watches of 
> HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily 
> found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature)
>  # Check which HS2 instance is watching on the ZK server found at 1, say it's 
> _hs2-of-2_
>  # Restart the ZK server found at 1
>  # Deregister _hs2-of-2_ with the command
> {noformat}
> hive --service hiveserver2 -deregister hs2-of-2{noformat}
>  # _hs2-of-2_ never knows that it must be shut down because the watch event 
> of DeregisterWatcher was already fired at the time of 3.
> The reason of the problem is explained at 
> [https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]
> I added some logging to DeRegisterWatcher and checked what events were 
> occurred at the time of 3(restarting of ZK server);
>  # WatchedEvent state:Disconnected type:None path:null
>  # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
>  # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
>  # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
>  path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]
> As the zk manual says, watches are one-time triggers. When the connection to 
> the ZK server was reestablished, state:SyncConnected 
> type:NodeDataChanged,path:hs2-of-2 was fired and it's the end. 
> *DeregisterWatcher must be registered again for the same znode to get a 
> future NodeDeleted event.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 111 matches

Mail list logo