[jira] [Work logged] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?focusedWorklogId=580456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580456
 ]

ASF GitHub Bot logged work on HIVE-24998:
-

Author: ASF GitHub Bot
Created on: 10/Apr/21 01:49
Start Date: 10/Apr/21 01:49
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on a change in pull request 
#2163:
URL: https://github.com/apache/hive/pull/2163#discussion_r610975447



##
File path: 
ql/src/test/results/clientpositive/llap/cbo_fallback_always_semantic_exception.q.out
##
@@ -6,7 +6,7 @@ POSTHOOK: query: explain select count(*) from src where key <=> 
100
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@src
  A masked pattern was here 
-Plan not optimized by CBO due to missing feature 
[Less_than_equal_greater_than].
+Plan optimized by CBO.

Review comment:
   Plan will go through CBO now. Similar changes below.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580456)
Time Spent: 0.5h  (was: 20m)

> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?focusedWorklogId=580455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580455
 ]

ASF GitHub Bot logged work on HIVE-24998:
-

Author: ASF GitHub Bot
Created on: 10/Apr/21 01:49
Start Date: 10/Apr/21 01:49
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 commented on a change in pull request 
#2163:
URL: https://github.com/apache/hive/pull/2163#discussion_r610975447



##
File path: 
ql/src/test/results/clientpositive/llap/cbo_fallback_always_semantic_exception.q.out
##
@@ -6,7 +6,7 @@ POSTHOOK: query: explain select count(*) from src where key <=> 
100
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@src
  A masked pattern was here 
-Plan not optimized by CBO due to missing feature 
[Less_than_equal_greater_than].
+Plan optimized by CBO.

Review comment:
   Plan will go through CBO now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580455)
Time Spent: 20m  (was: 10m)

> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-09 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24998:
---
Reporter: Manthan B Y  (was: Soumyakanti Das)

> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Manthan B Y
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24915) Distribute by with sort by clause when used with constant parameter for sort produces wrong result.

2021-04-09 Thread Suprith Chandrashekharachar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suprith Chandrashekharachar reassigned HIVE-24915:
--

Assignee: Zoltan Haindrich  (was: Suprith Chandrashekharachar)

> Distribute by with sort by clause when used with constant parameter for sort 
> produces wrong result.
> ---
>
> Key: HIVE-24915
> URL: https://issues.apache.org/jira/browse/HIVE-24915
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.4
>Reporter: Suprith Chandrashekharachar
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Distribute by with sort by clause when used with constant parameter for sort 
> produces wrong result.
> Example: 
> {code:java}
>  SELECT 
> t.time,
> 'a' as const
>   FROM
> (SELECT 1591819264 as time
> UNION ALL
> SELECT 1591819265 as time) t
>   DISTRIBUTE by const
>   sort by const, t.time
> {code}
> Produces
>   
> |{color:#00}*time*{color}|{color:#00}*const*{color}|
> | NULL|{color:#00}a{color}|
> | NULL|{color:#00}a{color}|
> Instead it should produce(Hive 0.13 produces this):
> |{color:#00}*time*{color}|{color:#00}*const*{color}|
> |{color:#00}*1591819264*{color}|{color:#00}a{color}|
> |{color:#00}*1591819265*{color}|{color:#00}a{color}|
> Incorrect sort columns are used while creating ReduceSink here 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L9066]
> With constant propagation optimizer enabled, due to incorrect constant 
> operator folding, incorrect results will be produced.
>  
> More examples for incorrect behavior:
> {code:java}
>   SELECT 
> t.time,
> 'a' as const,
> t.id
>   FROM
> (SELECT 1591819264 as time, 1 as id
> UNION ALL
> SELECT 1591819265 as time, 2 as id) t
>   DISTRIBUTE by t.time
>   sort by t.time, const, t.id
> {code}
> produces
> |{color:#00}*time*{color}|{color:#00}*const*{color}|{color:#00}*id*{color}|
> |{color:#00}*1591819264*{color}|{color:#00}a{color}|NULL |
> |{color:#00}*1591819265*{color}|{color:#00}a{color}| NULL|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24999) HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple correlations

2021-04-09 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24999:
--


> HiveSubQueryRemoveRule generates invalid plan for IN subquery with multiple 
> correlations
> 
>
> Key: HIVE-24999
> URL: https://issues.apache.org/jira/browse/HIVE-24999
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The problem can be reproduced by using the following query which at the 
> moment can be found in {{subquery_in.q}} file:
> {code:sql}
> explain cbo select * from part where p_name IN (select p_name from part p 
> where p.p_size = part.p_size AND part.p_size + 121150 = p.p_partkey );
> {code}
> The plans before and after {{HiveSubQueryRemoveRule}} are shown below:
> {noformat}
> 2021-04-09T14:29:08,031 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] 
> parse.CalcitePlanner: Plan before removing subquery:
> HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], 
> p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], 
> p_comment=[$8])
>   HiveFilter(condition=[IN($1, {
> HiveProject(p_name=[$1])
>   HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 121150), 
> $0))])
> HiveTableScan(table=[[default, part]], table:alias=[p])
> })])
> HiveTableScan(table=[[default, part]], table:alias=[part])
> 2021-04-09T14:29:08,056 DEBUG [9f8b0342-5609-4917-95a9-e7abc884f619 main] 
> parse.CalcitePlanner: Plan just after removing subquery:
> HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], 
> p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], 
> p_comment=[$8])
>   HiveFilter(condition=[=($1, $12)])
> LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{5}])
>   HiveTableScan(table=[[default, part]], table:alias=[part])
>   HiveProject(p_name=[$1])
> HiveFilter(condition=[AND(=($5, $cor0.p_size), =(+($cor0.p_size, 
> 121150), $0))])
>   HiveTableScan(table=[[default, part]], table:alias=[p])
> {noformat}
> The plan after applying the rule is invalid. The 
> {{HiveFilter(condition=[=($1, $12)])}} above the correlate references columns 
> ($12) from the right input which do not exist since the correlate is of type 
> SEMI. Running the test with {{-Dcalcite.debug}} property enabled raises an 
> {{AssertionError}} when building the {{HiveFilter}}.
> The problem is hidden at the moment since there is a specific hack in 
> {{HiveRelDecorrelator}} that turns this invalid plan into a valid one. This 
> mechanism is very brittle and it can break easily as it happened while fixing 
> HIVE-24957.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?focusedWorklogId=580319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580319
 ]

ASF GitHub Bot logged work on HIVE-24998:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 20:51
Start Date: 09/Apr/21 20:51
Worklog Time Spent: 10m 
  Work Description: soumyakanti3578 opened a new pull request #2163:
URL: https://github.com/apache/hive/pull/2163


   
   
   ### What changes were proposed in this pull request?
   `SqlFunctionConverter.java` has been changed to make IS_DISTINCT_FROM 
operator go through the CBO.
   
   
   ### Why are the changes needed?
   Currently, insert statements are failing when IS DISTINCT FROM or IS NOT 
DISTINCT FROM operators are used, which can be fixed by simply optimizing 
through the CBO.
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   
   ### How was this patch tested?
   Added tests to `is_distinct_from.q`, and validated the output.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580319)
Remaining Estimate: 0h
Time Spent: 10m

> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24998:
--
Labels: pull-request-available  (was: )

> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24998) IS [NOT] DISTINCT FROM failing with SemanticException

2021-04-09 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das reassigned HIVE-24998:
--


> IS [NOT] DISTINCT FROM failing with SemanticException
> -
>
> Key: HIVE-24998
> URL: https://issues.apache.org/jira/browse/HIVE-24998
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>
> Hive: INSERT statements failing with UDFArgumentException and 
> SemanticException
> Problem Statement:
> {code:java}
> CREATE TABLE t2(c0 boolean , c1 FLOAT );
> INSERT INTO t2(c0) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641));
> -- Insert failing with: Error: Error while compiling statement: FAILED: 
> UDFArgumentException UDF tables only one argument (state=42000,code=4)
> INSERT INTO t2(c0,c1) VALUES (NOT (0.379 IS NOT DISTINCT FROM 641), 0.2);
> -- Insert failing with: SemanticException 0:0 Expected 2 columns for 
> insclause-0/database52@t2; select produces 1 columns. Error encountered near 
> token '0.2' (state=42000,code=4) {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22633) GroupByOperator may throw NullPointerException when setting data skew optimization parameters

2021-04-09 Thread Rentao Wu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318230#comment-17318230
 ] 

Rentao Wu commented on HIVE-22633:
--

Bump, I'm hitting this issue as well.

> GroupByOperator may throw NullPointerException when setting data skew 
> optimization parameters
> -
>
> Key: HIVE-22633
> URL: https://issues.apache.org/jira/browse/HIVE-22633
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.1, 4.0.0
>Reporter: zhangbutao
>Priority: Major
>
> if hive.map.aggr and hive.groupby.skewindata set true,exception will be 
> thrown.
> step to repro:
> 1. create table: 
> set hive.map.aggr=true;
> set hive.groupby.skewindata=true;
> create table test1 (id1 bigint);
> create table test2 (id2 bigint) partitioned by(dt2 string);
> insert into test2 partition(dt2='2020') select a.id1 from test1 a group by 
> a.id1;
> 2.NullPointerException:
> {code:java}
> ], TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1585641455670_0001_2_03_00_2:java.lang.RuntimeException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFNumericStatsEvaluator.init(GenericUDAFComputeStats.java:373)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:373)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:191)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-09 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24997:
-
Description: Since  HIVE-24230  it assumes the UDF is evaluated on HS2 
which is not true in general. The SessionState is only available at compile 
time evaluation but later on a new interpreter should be instantiated.

> HPL/SQL udf doesn't work in tez container mode
> --
>
> Key: HIVE-24997
> URL: https://issues.apache.org/jira/browse/HIVE-24997
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> Since  HIVE-24230  it assumes the UDF is evaluated on HS2 which is not true 
> in general. The SessionState is only available at compile time evaluation but 
> later on a new interpreter should be instantiated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-09 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24997:



> HPL/SQL udf doesn't work in tez container mode
> --
>
> Key: HIVE-24997
> URL: https://issues.apache.org/jira/browse/HIVE-24997
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24078) result rows not equal in mr and tez

2021-04-09 Thread liguangyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317961#comment-17317961
 ] 

liguangyu commented on HIVE-24078:
--

Problem of execution order:

To perform first:  WHERE a2.programset_name IS NOT NULL

Redo: ON a1.program_set_id=a2.programset_id

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Priority: Blocker
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21574) return wrong result when execute left join sql

2021-04-09 Thread liguangyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317957#comment-17317957
 ] 

liguangyu commented on HIVE-21574:
--

It's probably because 'distinct null' .

When distinct multiple fields have more than one null in them,an error will 
occur.

You can replace then following null in advance .

 

> return wrong result when execute left join sql
> --
>
> Key: HIVE-21574
> URL: https://issues.apache.org/jira/browse/HIVE-21574
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
> Environment: hive 3.1.0 hdfs 3.1.1
>Reporter: Panda Song
>Priority: Blocker
>
> Can somebody delete this issue please?
> when I use a table instead of the sub select,I get the right result,much more 
> rows are joined together(metrics old_uv is bigger!!!) 
> Is there some bugs here?
> Please help me ,thanks a lot!!
> {code:java}
> select 
> a.event_date,
> count(distinct a.device_id) as uv,
> count(distinct case when b.device_id is not null then b.device_id end) as 
> old_uv,
> count(distinct a.device_id) - count(distinct case when b.device_id is not 
> null then b.device_id end) as new_uv
> from
> (
> select
> event_date,
> device_id,
> qingting_id
> from datacenter.bl_page_chain_day
> where event_date = '2019-03-31'
> and (current_content like '/membership5%'
> or current_content like '/vips/members%'
> or current_content like '/members/v2/%')
> )a
> left join
> (select
>   b.device_id
> from
> lzq_test.first_buy_vip a
> inner join datacenter.device_qingting b on a.qingting_id = b.qingting_id
> where a.first_buy < '2019-03-31'
> group by b.device_id
> )b
> on a.device_id = b.device_id
> group by a.event_date;
> {code}
> plan:
> {code:java}
> Plan optimized by CBO. 
> 
>  Vertex dependency in root stage
>  Map 1 <- Map 3 (BROADCAST_EDGE)
>  Reducer 2 <- Map 1 (SIMPLE_EDGE)   
>  Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Reducer 2 (ONE_TO_ONE_EDGE) 
>  Reducer 6 <- Reducer 5 (SIMPLE_EDGE)   
> 
>  Stage-0
>Fetch Operator   
>  limit:-1   
>  Stage-1
>Reducer 6
>File Output Operator [FS_26] 
>  Select Operator [SEL_25] (rows=35527639 width=349) 
>Output:["_col0","_col1","_col2","_col3"] 
>Group By Operator [GBY_24] (rows=35527639 width=349) 
>  Output:["_col0","_col1","_col2"],aggregations:["count(DISTINCT 
> KEY._col1:0._col0)","count(DISTINCT KEY._col1:1._col0)"],keys:KEY._col0 
><-Reducer 5 [SIMPLE_EDGE]
>  SHUFFLE [RS_23]
>PartitionCols:_col0  
>Group By Operator [GBY_22] (rows=71055278 width=349) 
>  
> Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["count(DISTINCT
>  _col1)","count(DISTINCT _col2)"],keys:true, _col1, _col2 
>  Select Operator [SEL_20] (rows=71055278 width=349) 
>Output:["_col1","_col2"] 
>Map Join Operator [MAPJOIN_45] (rows=71055278 width=349) 
>  
> Conds:RS_17.KEY.reducesinkkey0=RS_18.KEY.reducesinkkey0(Right 
> Outer),Output:["_col0","_col1"] 
><-Reducer 2 [ONE_TO_ONE_EDGE]
>  FORWARD [RS_17]
>PartitionCols:_col0  
>Group By Operator [GBY_12] (rows=21738609 width=235) 
>  Output:["_col0"],keys:KEY._col0 
><-Map 1 [SIMPLE_EDGE]
>  SHUFFLE [RS_11]
>PartitionCols:_col0  
>Group By Operator [GBY_10] (rows=43477219 
> width=235) 
>  Output:["_col0"],keys:_col0 
>  Map Join Operator [MAPJOIN_44] (rows=43477219 
> width=235) 
>
> Conds:SEL_2._col1=RS_7._col0(Inner),Output:["_col0"] 
>  <-Map 3 [BROADCAST_EDGE] 
>BROADCAST [RS_7] 
>  PartitionCols:_col0 
>  Select Operator [SEL_5] (rows=301013 
> width=228) 
>Output:["_col0"] 
>Filter Operator [FIL_32] (rows=301013 
> width=228) 
>  

[jira] [Updated] (HIVE-24996) Conversion of PIG script with multiple store causing the merging of multiple sql statements

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24996:
--
Labels: pull-request-available  (was: )

> Conversion of PIG script with multiple store causing the merging of multiple 
> sql statements
> ---
>
> Key: HIVE-24996
> URL: https://issues.apache.org/jira/browse/HIVE-24996
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The sql write is not reset after sql statement is converted. This is causing 
> the next sql statements to be merged with the previous one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24996) Conversion of PIG script with multiple store causing the merging of multiple sql statements

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24996?focusedWorklogId=579941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579941
 ]

ASF GitHub Bot logged work on HIVE-24996:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 12:18
Start Date: 09/Apr/21 12:18
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #2390:
URL: https://github.com/apache/calcite/pull/2390


   The sql write is not reset after sql statement is converted. This is causing 
the next sql statements to be merged with the previous one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579941)
Remaining Estimate: 0h
Time Spent: 10m

> Conversion of PIG script with multiple store causing the merging of multiple 
> sql statements
> ---
>
> Key: HIVE-24996
> URL: https://issues.apache.org/jira/browse/HIVE-24996
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The sql write is not reset after sql statement is converted. This is causing 
> the next sql statements to be merged with the previous one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24740) Invalid table alias or column reference: Can't order by an unselected column

2021-04-09 Thread liguangyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317942#comment-17317942
 ] 

liguangyu commented on HIVE-24740:
--

The alias column1  could not be found.

execution sequence : group by ,  select , order by .

All of the following are true:

select substr(column1,1,4) column1, avg(column1) from t1 group by 
substr(column1,1,4) order by column1;

select substr(column1,1,4) column1, avg(column1) from t1 group by column1 order 
by column1;

 

 

> Invalid table alias or column reference: Can't order by an unselected column
> 
>
> Key: HIVE-24740
> URL: https://issues.apache.org/jira/browse/HIVE-24740
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleksiy Sayankin
>Priority: Blocker
>
> {code}
> CREATE TABLE t1 (column1 STRING);
> {code}
> {code}
> select substr(column1,1,4), avg(column1) from t1 group by substr(column1,1,4) 
> order by column1;
> {code}
> {code}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:87 Invalid table 
> alias or column reference 'column1': (possible column names are: _c0, _c1, 
> .(tok_function substr (tok_table_or_col column1) 1 4), .(tok_function avg 
> (tok_table_or_col column1)))
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5645)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5576)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.getOrderByExpression(CalcitePlanner.java:4326)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.beginGenOBLogicalPlan(CalcitePlanner.java:4230)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genOBLogicalPlan(CalcitePlanner.java:4136)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5326)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1864)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1810)
>   at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1571)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:562)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12538)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:315)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Commented] (HIVE-24573) hive 3.1.2 drop table Sometimes it can't be deleted

2021-04-09 Thread liguangyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317939#comment-17317939
 ] 

liguangyu commented on HIVE-24573:
--

whether there is a space in the table name?

> hive 3.1.2 drop table Sometimes it can't be deleted
> ---
>
> Key: HIVE-24573
> URL: https://issues.apache.org/jira/browse/HIVE-24573
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: paul
>Priority: Blocker
>
> Execute drop table if exists trade_ 4_ Temp448 statement, the table cannot be 
> deleted; hive.log  The log shows 
>   2020-12-29T07:30:04,840 ERROR [HiveServer2-Background-Pool: Thread-6483] 
> metadata.Hive: Table dc_usermanage.trade_3_temp448 not found: 
> hive.dc_usermanage.trade_3_temp448 table not found
>  
> Statement returns success
>  
> I doubt that this problem will only arise under the condition of high-level 
> merger. We run a lot of tasks every day, one or two tasks every day, which 
> will happen
>  
> metastore  mysql
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=579929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579929
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 11:53
Start Date: 09/Apr/21 11:53
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1778:
URL: https://github.com/apache/hive/pull/1778


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579929)
Time Spent: 40m  (was: 0.5h)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24996) Conversion of PIG script with multiple store causing the merging of multiple sql statements

2021-04-09 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24996:
--


> Conversion of PIG script with multiple store causing the merging of multiple 
> sql statements
> ---
>
> Key: HIVE-24996
> URL: https://issues.apache.org/jira/browse/HIVE-24996
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> The sql write is not reset after sql statement is converted. This is causing 
> the next sql statements to be merged with the previous one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24958) Create Iceberg catalog module in Hive

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24958?focusedWorklogId=579891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579891
 ]

ASF GitHub Bot logged work on HIVE-24958:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 10:36
Start Date: 09/Apr/21 10:36
Worklog Time Spent: 10m 
  Work Description: marton-bod edited a comment on pull request #2138:
URL: https://github.com/apache/hive/pull/2138#issuecomment-816589194


   > Generally looks good to me.
   > A few nits only. Did we remove all the "hacks" we did for the 
iceberg-handler module which were needed until we have the metastore module?
   
   Thanks for the review! Yes, the only hack we could remove with this one was 
this:
   
https://github.com/apache/hive/pull/2138/files#diff-9d31f72cca8ab08ae120d321c1b58816336936a23a4ef720ad65e79d3acbe743L357
   (and also the removal of `TestHiveMetastore` from the handler module)
   Most other hacks depend on Iceberg 0.12.0 coming out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579891)
Time Spent: 40m  (was: 0.5h)

> Create Iceberg catalog module in Hive
> -
>
> Key: HIVE-24958
> URL: https://issues.apache.org/jira/browse/HIVE-24958
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-catalog module in Hive, with the code currently 
> contained in Iceberg's iceberg-hive-metastore module
>  * Make sure all tests pass (including static analysis and checkstyle)
>  * Make iceberg-handler depend on this module instead of 
> iceberg-hive-metastore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24958) Create Iceberg catalog module in Hive

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24958?focusedWorklogId=579886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579886
 ]

ASF GitHub Bot logged work on HIVE-24958:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 10:32
Start Date: 09/Apr/21 10:32
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2138:
URL: https://github.com/apache/hive/pull/2138#issuecomment-816589194


   > Generally looks good to me.
   > A few nits only. Did we remove all the "hacks" we did for the 
iceberg-handler module which were needed until we have the metastore module?
   
   Thanks for the review! Yes, the only hack we could remove with this one was 
this:
   
https://github.com/apache/hive/pull/2138/files#diff-9d31f72cca8ab08ae120d321c1b58816336936a23a4ef720ad65e79d3acbe743L357
   Most other hacks depend on Iceberg 0.12.0 coming out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579886)
Time Spent: 0.5h  (was: 20m)

> Create Iceberg catalog module in Hive
> -
>
> Key: HIVE-24958
> URL: https://issues.apache.org/jira/browse/HIVE-24958
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> * Create a new iceberg-catalog module in Hive, with the code currently 
> contained in Iceberg's iceberg-hive-metastore module
>  * Make sure all tests pass (including static analysis and checkstyle)
>  * Make iceberg-handler depend on this module instead of 
> iceberg-hive-metastore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24995) Add support for complex type operator in Join with non equality condition

2021-04-09 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24995:
--


> Add support for complex type operator in Join with non equality condition 
> --
>
> Key: HIVE-24995
> URL: https://issues.apache.org/jira/browse/HIVE-24995
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>
> This subtask is specifically to support non equal comparison like greater 
> than, smaller than etc as join condition. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24978) Optimise number of DROP_PARTITION events created.

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24978?focusedWorklogId=579868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579868
 ]

ASF GitHub Bot logged work on HIVE-24978:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 09:43
Start Date: 09/Apr/21 09:43
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2154:
URL: https://github.com/apache/hive/pull/2154#discussion_r609321577



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/events/DropPartitionEvent.java
##
@@ -45,6 +45,14 @@ public DropPartitionEvent (Table table,
 this.deleteData = deleteData;
   }
 
+  public DropPartitionEvent(Table table, Iterable partition, 
boolean status, boolean deleteData,
+  IHMSHandler handler) {
+super(status, handler);
+this.table = table;
+this.partitions = partition;

Review comment:
   rename to partitions

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/partition/drop/AlterTableDropPartitionOperation.java
##
@@ -120,6 +126,12 @@ private void dropPartitions() throws HiveException {
 List droppedPartitions = 
context.getDb().dropPartitions(tablenName.getDb(), tablenName.getTable(),
 partitionExpressions, options);
 
+if (isRepl) {

Review comment:
   what is this check




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579868)
Time Spent: 20m  (was: 10m)

> Optimise number of DROP_PARTITION events created.
> -
>
> Key: HIVE-24978
> URL: https://issues.apache.org/jira/browse/HIVE-24978
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Presently there is one event for every drop, optimise to merge them, to save 
> the number of calls to HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24746) PTF: TimestampValueBoundaryScanner can be optimised during range computation

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24746?focusedWorklogId=579856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579856
 ]

ASF GitHub Bot logged work on HIVE-24746:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 09:11
Start Date: 09/Apr/21 09:11
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1950:
URL: https://github.com/apache/hive/pull/1950#issuecomment-816542489


   @rbalamohan : just squashed the commits together, can we merge this?
   you can see perf jmh benchmark results above
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579856)
Time Spent: 1h 50m  (was: 1h 40m)

> PTF: TimestampValueBoundaryScanner can be optimised during range computation
> 
>
> Key: HIVE-24746
> URL: https://issues.apache.org/jira/browse/HIVE-24746
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> During range computation, timestamp ranges become a hotspot due to 
> "TimeStamp" comparisons. It has to construct the entire TimeStamp object via 
> OI (which incurs LocalTime computation etc internally).
>  
> All these are done for "equals" comparison which can be done with "seconds & 
> nanoseconds" present in TimeStamp.
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L852]
>  
>  
> Request is to explore optimising this code path, so that equals() can be 
> performed with "seconds/nanoseconds" instead of entire timestamp
>  
> {noformat}
> at 
> org.apache.hadoop.hive.common.type.Timestamp.setTimeInSeconds(Timestamp.java:133)
>   at 
> org.apache.hadoop.hive.serde2.io.TimestampWritableV2.populateTimestamp(TimestampWritableV2.java:401)
>   at 
> org.apache.hadoop.hive.serde2.io.TimestampWritableV2.getTimestamp(TimestampWritableV2.java:210)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1239)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1181)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.TimestampValueBoundaryScanner.isEqual(ValueBoundaryScanner.java:848)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEndCurrentRow(ValueBoundaryScanner.java:593)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEnd(ValueBoundaryScanner.java:530)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.getRange(BasePartitionEvaluator.java:273)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.iterate(BasePartitionEvaluator.java:219)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:147)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.access$100(WindowingTableFunction.java:61)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction$WindowingIterator.next(WindowingTableFunction.java:755)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:373)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:104)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=579853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579853
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 09:05
Start Date: 09/Apr/21 09:05
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2071:
URL: https://github.com/apache/hive/pull/2071#discussion_r610468452



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/HiveWritableComparator.java
##
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.io.WritableComparable;
+import org.apache.hadoop.io.WritableComparator;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+
+class HiveListComparator extends HiveWritableComparator {
+// For List, all elements will have same type, so only one comparator is 
sufficient.
+HiveWritableComparator comparator = null;
+
+@Override
+public int compare(Object key1, Object key2) {
+ArrayList a1 = (ArrayList) key1;
+ArrayList a2 = (ArrayList) key2;
+if (a1.size() != a2.size()) {
+return a1.size() > a2.size() ? 1 : -1;
+}
+if (a1.size() == 0) {
+return 0;
+}
+
+if (comparator == null) {
+// For List, all elements should be of same type.
+comparator = HiveWritableComparator.get(a1.get(0));
+}
+
+int result = 0;
+for (int i = 0; i < a1.size(); i++) {
+result = comparator.compare(a1.get(i), a2.get(i));
+if (result != 0) {
+return result;
+}
+}
+return result;
+}
+}
+
+class HiveStructComparator extends HiveWritableComparator {
+HiveWritableComparator[] comparator = null;
+
+@Override
+public int compare(Object key1, Object key2) {
+ArrayList a1 = (ArrayList) key1;
+ArrayList a2 = (ArrayList) key2;
+if (a1.size() != a2.size()) {
+return a1.size() > a2.size() ? 1 : -1;
+}
+if (a1.size() == 0) {
+return 0;
+}
+if (comparator == null) {
+comparator = new HiveWritableComparator[a1.size()];
+// For struct all elements may not be of same type, so create 
comparator for each entry.
+for (int i = 0; i < a1.size(); i++) {
+comparator[i] = HiveWritableComparator.get(a1.get(i));
+}
+}
+int result = 0;
+for (int i = 0; i < a1.size(); i++) {
+result = comparator[i].compare(a1.get(i), a2.get(i));
+if (result != 0) {
+return result;
+}
+}
+return result;
+}
+}
+
+class HiveMapComparator extends HiveWritableComparator {
+HiveWritableComparator comparatorValue = null;
+HiveWritableComparator comparatorKey = null;
+
+@Override
+public int compare(Object key1, Object key2) {
+LinkedHashMap map1 = (LinkedHashMap) key1;
+LinkedHashMap map2 = (LinkedHashMap) key2;
+if (map1.entrySet().size() != map2.entrySet().size()) {
+return map1.entrySet().size() > map2.entrySet().size() ? 1 : -1;
+}
+if (map1.entrySet().size() == 0) {
+return 0;
+}
+
+if (comparatorKey == null) {
+comparatorKey = 
HiveWritableComparator.get(map1.keySet().iterator().next());
+comparatorValue = 
HiveWritableComparator.get(map1.values().iterator().next());
+}
+
+int result = comparatorKey.compare(map1.keySet().iterator().next(),
+map2.keySet().iterator().next());
+if (result != 0) {
+return result;
+}
+return comparatorValue.compare(map1.values().iterator().next(), 
map2.values().iterator().next());
+}
+}
+
+public class HiveWritableComparator extends WritableComparator {

Review comment:
   done

##
File path: 

[jira] [Work logged] (HIVE-24994) get_aggr_stats_for call fail with "Tried to send an out-of-range integer"

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24994?focusedWorklogId=579845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579845
 ]

ASF GitHub Bot logged work on HIVE-24994:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 08:39
Start Date: 09/Apr/21 08:39
Worklog Time Spent: 10m 
  Work Description: vnhive commented on a change in pull request #2162:
URL: https://github.com/apache/hive/pull/2162#discussion_r610452112



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java
##
@@ -203,7 +203,7 @@ public boolean isTableNotExistsError(SQLException e) {
* Whether the RDBMS has restrictions on IN list size (explicit, or poor 
perf-based).
*/
   protected boolean needsInBatching() {
-return isORACLE() || isSQLSERVER();
+return isORACLE() || isSQLSERVER() || isPOSTGRES();

Review comment:
   Setting hive.metastore.direct.sql.batch.size should have the same effect 
as this fix. We can just ask the customer to set this configuration variable 
instead of needing this fix.
   
   This was the reason I did not submit a PR with this fix in the first place.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579845)
Time Spent: 20m  (was: 10m)

> get_aggr_stats_for call fail with "Tried to send an out-of-range integer"
> -
>
> Key: HIVE-24994
> URL: https://issues.apache.org/jira/browse/HIVE-24994
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> aggrColStatsForPartitions call fail with the Postgres LIMIT if the no of 
> partitions passed in the direct sql goes beyond the 32767
> {code:java}
> postgresql.util.PSQLException: An I/O error occurred while sending to the 
> backend.
>  at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:337) 
> ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446) 
> ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370) 
> ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
>  ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
>  ~[postgresql-42.2.8.jar:42.2.8]
>  at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
>  ~[HikariCP-2.6.1.jar:?]
>  at 
> com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
>  [HikariCP-2.6.1.jar:?]
>  at 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375)
>  [datanucleus-rdbms-4.1.19.jar:?]
>  at 
> org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552)
>  [datanucleus-rdbms-4.1.19.jar:?]
>  at 
> org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) 
> [datanucleus-rdbms-4.1.19.jar:?]
>  at 
> org.datanucleus.store.query.Query.executeQuery(Query.java:1855) 
> [datanucleus-core-4.1.17.jar:?]
>  at 
> org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807)
>  [datanucleus-rdbms-4.1.19.jar:?]
>  at 
> org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) 
> [datanucleus-api-jdo-4.2.4.jar:?]
>  at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) 
> [datanucleus-api-jdo-4.2.4.jar:?]
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2058)
>  [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2050)
>  [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$1500(MetaStoreDirectSql.java:110)
>  [hive-exec-3.1.0.3.1.5.6019-4.jar:3.1.0.3.1.5.6019-4]
>  at 
> 

[jira] [Work logged] (HIVE-24985) Create new metrics about locks

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24985?focusedWorklogId=579841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579841
 ]

ASF GitHub Bot logged work on HIVE-24985:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 08:01
Start Date: 09/Apr/21 08:01
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2158:
URL: https://github.com/apache/hive/pull/2158#discussion_r610427699



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -415,60 +415,64 @@ public void testDBMetrics() throws Exception {
 String dbName = "default";
 String tblName = "dcamc";
 Table t = newTable(dbName, tblName, false);
-burnThroughTransactions(t.getDbName(), t.getTableName(), 24);
 
-// create and commit txn with non-empty txn_components
+long start = System.currentTimeMillis() - 1000L;

Review comment:
   Why subtract a second here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579841)
Time Spent: 40m  (was: 0.5h)

> Create new metrics about locks
> --
>
> Key: HIVE-24985
> URL: https://issues.apache.org/jira/browse/HIVE-24985
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Basic metrics that can help investigate.
> Ideas:
> *  number of locks
> * oldest lock's age



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=579836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579836
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 07:45
Start Date: 09/Apr/21 07:45
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610417708



##
File path: 
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -153,6 +156,37 @@ public DecomposedPredicate decomposePredicate(JobConf 
jobConf, Deserializer dese
 return predicate;
   }
 
+  @Override
+  public boolean canProvideBasicStatistics() {
+return true;
+  }
+
+  @Override
+  public Map getBasicStatistics(TableDesc tableDesc) {
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+Map stats = new HashMap<>();
+if (table.currentSnapshot() != null) {
+  Map summary = table.currentSnapshot().summary();
+  if (summary != null) {
+if (summary.containsKey(SnapshotSummary.TOTAL_DATA_FILES_PROP)) {
+  stats.put(StatsSetupConst.NUM_FILES, 
summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
+}
+if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
+  stats.put(StatsSetupConst.ROW_COUNT, 
summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+}
+// TODO: add TOTAL_SIZE when iceberg 0.12 is released
+if (summary.containsKey("total-files-size")) {
+  stats.put(StatsSetupConst.TOTAL_SIZE, 
summary.get("total-files-size"));
+}
+  }
+} else {
+  stats.put(StatsSetupConst.NUM_FILES, "0");

Review comment:
   Is this for empty table, or when we do not have statistics at hand?
   We might want to handle the situation when we do not have statistics 
calculated yet, or we have an incomplete table info.
   
   On the Iceberg dev list I have seen this conversation:
   
https://mail-archives.apache.org/mod_mbox/iceberg-dev/202104.mbox/%3c9a11adb4-27d8-40f1-8141-531287c03...@gmail.com%3e
   
   > So the tldr, Missing is OK, but inaccurate is not




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579836)
Time Spent: 5.5h  (was: 5h 20m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=579835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579835
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 07:40
Start Date: 09/Apr/21 07:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610415034



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##
@@ -119,16 +129,83 @@ public String getName() {
 return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-Partish partish;
-Map params;
-Object result;
+  abstract static class StatCollector implements Runnable {
+
+protected Partish partish;
+protected Object result;
+protected LogHelper console;
+
+public static Function SIMPLE_NAME_FUNCTION =
+sc -> String.format("%s#%s", 
sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+public static Function EXTRACT_RESULT_FUNCTION = 
sc -> (Partition) sc.result();
+
+abstract Partish partish();
+abstract boolean isValid();
+abstract Object result();
+abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+protected String toString(Map parameters) {
+  return StatsSetupConst.SUPPORTED_STATS.stream().map(st -> st + "=" + 
parameters.get(st))
+  .collect(Collectors.joining(", "));
+}
   }
 
-  static class FooterStatCollector implements Runnable {
+  static class HiveStorageHandlerStatCollector extends StatCollector {
+
+public HiveStorageHandlerStatCollector(Partish partish) {
+  this.partish = partish;
+}
+
+@Override
+public void init(HiveConf conf, LogHelper console) throws IOException {
+  this.console = console;
+}
+
+@Override
+public void run() {
+  try {
+Table table = partish.getTable();
+Map parameters = partish.getPartParameters();
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Map basicStatistics = 
table.getStorageHandler().getBasicStatistics(tableDesc);

Review comment:
   If the table would be partitioned then this would not provide enough 
information to the StorageHandler to generated partition related statistics.
   Either we should document it or provide some info to the StorageHandler to 
calculate partition statistics




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579835)
Time Spent: 5h 20m  (was: 5h 10m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=579834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579834
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 07:38
Start Date: 09/Apr/21 07:38
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610413396



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -197,4 +197,22 @@ default boolean addDynamicSplitPruningEdge(ExprNodeDesc 
syntheticFilterPredicate
   default Map getOperatorDescProperties(OperatorDesc 
operatorDesc, Map initialProps) {
 return initialProps;
   }
+
+  /**
+   * Return some basic statistics (numRows, numFiles, totalSize) calculated by 
the underlying storage handler
+   * implementation.
+   * @param tableDesc a valid table description, used to load the table
+   * @return map of basic statistics, can be null
+   */
+  default Map getBasicStatistics(TableDesc tableDesc) {
+return null;
+  }
+
+  /**
+   * Check if the storage handler can provide basic statistics.
+   * @return true if the storage handler can supply the basic statistics
+   */
+  default boolean canProvideBasicStatistics() {

Review comment:
   Ok.. I see why it is separated...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579834)
Time Spent: 5h 10m  (was: 5h)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection

2021-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24981 started by Ádám Szita.
-
> Add control file option to HiveStrictManagedMigration for DB/table selection
> 
>
> Key: HIVE-24981
> URL: https://issues.apache.org/jira/browse/HIVE-24981
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> Currently HiveStrictManagedMigration supports db regex and table regex 
> options that allow the user to specify what Hive entities it should deal 
> with. In cases where we have thousands of tables across thousands of DBs 
> iterating through everything takes a lot of time, while specifying a set of 
> tables/DBs with regexes is cumbersome.
> We should make it available for users to prepare control files with the lists 
> of required items to migrate and feed this to the tool. A directory path 
> pointing to these control files would be taken as a new option for HSMM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=579827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579827
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 07:35
Start Date: 09/Apr/21 07:35
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610411498



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##
@@ -197,4 +197,22 @@ default boolean addDynamicSplitPruningEdge(ExprNodeDesc 
syntheticFilterPredicate
   default Map getOperatorDescProperties(OperatorDesc 
operatorDesc, Map initialProps) {
 return initialProps;
   }
+
+  /**
+   * Return some basic statistics (numRows, numFiles, totalSize) calculated by 
the underlying storage handler
+   * implementation.
+   * @param tableDesc a valid table description, used to load the table
+   * @return map of basic statistics, can be null
+   */
+  default Map getBasicStatistics(TableDesc tableDesc) {
+return null;
+  }
+
+  /**
+   * Check if the storage handler can provide basic statistics.
+   * @return true if the storage handler can supply the basic statistics
+   */
+  default boolean canProvideBasicStatistics() {

Review comment:
   Do we need both methods?
   Wouldn't it be better to handle `null` from `getBasicStatistics()` as 
`!canProvideBasicStatistics()`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579827)
Time Spent: 5h  (was: 4h 50m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=579825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579825
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 07:33
Start Date: 09/Apr/21 07:33
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r610410409



##
File path: 
iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
##
@@ -92,6 +97,11 @@
   Types.TimestampType.withoutZone(), Types.StringType.get(), 
Types.BinaryType.get(),
   Types.DecimalType.of(3, 1), Types.UUIDType.get(), 
Types.FixedType.ofLength(5),
   Types.TimeType.get());
+  private static final Map STATS_MAPPING = ImmutableMap.of(

Review comment:
   nit: maybe newline




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579825)
Time Spent: 4h 50m  (was: 4h 40m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=579784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579784
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 06:10
Start Date: 09/Apr/21 06:10
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2071:
URL: https://github.com/apache/hive/pull/2071#discussion_r610369650



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/HiveWritableComparator.java
##
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.io.WritableComparable;
+import org.apache.hadoop.io.WritableComparator;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+
+class HiveListComparator extends HiveWritableComparator {
+// For List, all elements will have same type, so only one comparator is 
sufficient.
+HiveWritableComparator comparator = null;
+
+@Override
+public int compare(Object key1, Object key2) {
+ArrayList a1 = (ArrayList) key1;
+ArrayList a2 = (ArrayList) key2;
+if (a1.size() != a2.size()) {
+return a1.size() > a2.size() ? 1 : -1;
+}
+if (a1.size() == 0) {
+return 0;
+}
+
+if (comparator == null) {
+// For List, all elements should be of same type.
+comparator = HiveWritableComparator.get(a1.get(0));
+}
+
+int result = 0;
+for (int i = 0; i < a1.size(); i++) {
+result = comparator.compare(a1.get(i), a2.get(i));
+if (result != 0) {
+return result;
+}
+}
+return result;
+}
+}
+
+class HiveStructComparator extends HiveWritableComparator {
+HiveWritableComparator[] comparator = null;
+
+@Override
+public int compare(Object key1, Object key2) {
+ArrayList a1 = (ArrayList) key1;
+ArrayList a2 = (ArrayList) key2;
+if (a1.size() != a2.size()) {

Review comment:
   Yes, added null check for all.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579784)
Time Spent: 1h 20m  (was: 1h 10m)

> Add support for complex types columns in Hive Joins
> ---
>
> Key: HIVE-24883
> URL: https://issues.apache.org/jira/browse/HIVE-24883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hive fails to execute joins on array type columns as the comparison functions 
> are not able to handle array type columns.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=579783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579783
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 06:10
Start Date: 09/Apr/21 06:10
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2071:
URL: https://github.com/apache/hive/pull/2071#discussion_r610369511



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
##
@@ -2594,7 +2594,9 @@ private boolean validateMapJoinDesc(MapJoinDesc desc) {
   return false;
 }
 List keyExprs = desc.getKeys().get(posBigTable);
-if (!validateExprNodeDesc(keyExprs, "Key")) {
+if (!validateExprNodeDescNoComplex(keyExprs, "Key")) {

Review comment:
   The above lines checks for filter expression. The issue is with the join 
keys.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579783)
Time Spent: 1h 10m  (was: 1h)

> Add support for complex types columns in Hive Joins
> ---
>
> Key: HIVE-24883
> URL: https://issues.apache.org/jira/browse/HIVE-24883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hive fails to execute joins on array type columns as the comparison functions 
> are not able to handle array type columns.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=579781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579781
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 06:07
Start Date: 09/Apr/21 06:07
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #2071:
URL: https://github.com/apache/hive/pull/2071#issuecomment-816434173


   > 1. Which join operators are we targeting ? Checking the 
`CommonJoinOperator` hierarchy I see a few classes that were not affected by 
your changes (e.g., `MapJoinOperator`, `JoinOperator`, and `VectorXXX`) and I 
am wondering if that is normal. Do they already support complex types? Should 
they support complex types in the future?
   
   As of now hash based joins are working fine. This patch fixes the issue with 
SMB and Common merge join.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579781)
Time Spent: 1h  (was: 50m)

> Add support for complex types columns in Hive Joins
> ---
>
> Key: HIVE-24883
> URL: https://issues.apache.org/jira/browse/HIVE-24883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive fails to execute joins on array type columns as the comparison functions 
> are not able to handle array type columns.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=579780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579780
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 06:06
Start Date: 09/Apr/21 06:06
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #2071:
URL: https://github.com/apache/hive/pull/2071#issuecomment-816433622


   > 2\. Which kind of joins are we tackling? Apart from equality joins (`=`) 
there are more operators that can appear such as (`<>,<,>,<=,>=`, etc), what 
happens with them?
   
   Thanks for pointing this out. I will create a separate Jira as currently 
only equal operator is supported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579780)
Time Spent: 50m  (was: 40m)

> Add support for complex types columns in Hive Joins
> ---
>
> Key: HIVE-24883
> URL: https://issues.apache.org/jira/browse/HIVE-24883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hive fails to execute joins on array type columns as the comparison functions 
> are not able to handle array type columns.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=579779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579779
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 06:05
Start Date: 09/Apr/21 06:05
Worklog Time Spent: 10m 
  Work Description: maheshk114 edited a comment on pull request #2071:
URL: https://github.com/apache/hive/pull/2071#issuecomment-816432936


   > 3\. What are the semantics of the comparisons? Are we following the SQL 
standard?
   
   I am not aware of any SQL standards for complex type comparison. The join 
ordering used follows the normal comparison, equality is check from left to 
right fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579779)
Time Spent: 40m  (was: 0.5h)

> Add support for complex types columns in Hive Joins
> ---
>
> Key: HIVE-24883
> URL: https://issues.apache.org/jira/browse/HIVE-24883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hive fails to execute joins on array type columns as the comparison functions 
> are not able to handle array type columns.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

2021-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=579778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579778
 ]

ASF GitHub Bot logged work on HIVE-24883:
-

Author: ASF GitHub Bot
Created on: 09/Apr/21 06:04
Start Date: 09/Apr/21 06:04
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #2071:
URL: https://github.com/apache/hive/pull/2071#issuecomment-816432936


   > 3\. What are the semantics of the comparisons? Are we following the SQL 
standard?
   
   I am not aware of any SQL standards for complex type comparison. The join 
ordering used follows the normal comparison, equality is check from left to 
right fileds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 579778)
Time Spent: 0.5h  (was: 20m)

> Add support for complex types columns in Hive Joins
> ---
>
> Key: HIVE-24883
> URL: https://issues.apache.org/jira/browse/HIVE-24883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive fails to execute joins on array type columns as the comparison functions 
> are not able to handle array type columns.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)