[jira] [Created] (HIVE-24274) Implement Query Text based MaterializedView rewrite

2020-10-14 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-24274:
-

 Summary: Implement Query Text based MaterializedView rewrite
 Key: HIVE-24274
 URL: https://issues.apache.org/jira/browse/HIVE-24274
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Besides the way queries are currently rewritten to use materialized views in 
Hive this project provides an alternative:
Compare the query text with the materialized views query text stored. If we 
found a match the original query's logical plan can be replaced by a scan on 
the materialized view.
- Only materialized views which are enabled to rewrite can participate
- Use existing *HiveMaterializedViewsRegistry* through *Hive* object by adding 
a lookup method by query text.
- There might be more than one materialized views which have the same query 
text. In this case chose the first valid one.
- Validation can be done by calling 
*Hive.validateMaterializedViewsFromRegistry()*
- The scope of this first patch is rewriting queries which entire text can be 
matched only.
- Use the expanded query text (fully qualified column and table names) for 
comparing




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24249) Create View fails if a materialized view exists with the same query

2020-10-09 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-24249:
-

 Summary: Create View fails if a materialized view exists with the 
same query
 Key: HIVE-24249
 URL: https://issues.apache.org/jira/browse/HIVE-24249
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code:java}
create table t1(col0 int) STORED AS ORC
  TBLPROPERTIES ('transactional'='true');

create materialized view mv1 as
select * from t1 where col0 > 2;

create view mv1 as
select sub.* from (select * from t1 where col0 > 2) sub
where sub.col0 = 10;
{code}
The planner realize that the view definition has a subquery which match the 
materialized view query and replaces it to the materialized view scan.
{code:java}
HiveProject($f0=[CAST(10):INTEGER])
  HiveFilter(condition=[=(10, $0)])
HiveTableScan(table=[[default, mv1]], table:alias=[default.mv1])
{code}
Then exception is thrown:
{code:java}
 org.apache.hadoop.hive.ql.parse.SemanticException: View definition references 
materialized view default.mv1
at 
org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211)
at 
org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:174)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:415)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:364)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:358)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at 

[jira] [Created] (HIVE-24199) Incorrect result when subquey in exists contains limit

2020-09-24 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-24199:
-

 Summary: Incorrect result when subquey in exists contains limit
 Key: HIVE-24199
 URL: https://issues.apache.org/jira/browse/HIVE-24199
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code:java}
create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as orc;

insert into web_sales values
(1, 1),
(1, 2),
(2, 1),
(2, 2);

select * from web_sales ws1
where exists (select 1 from web_sales ws2 where ws1.ws_order_number = 
ws2.ws_order_number limit 1);
1   1
1   2
{code}
{code:java}
CBO PLAN:
HiveSemiJoin(condition=[=($0, $2)], joinType=[semi])
  HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1])
HiveFilter(condition=[IS NOT NULL($0)])
  HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
  HiveProject(ws_order_number=[$0])
HiveSortLimit(fetch=[1])  <-- This shouldn't be added
  HiveProject(ws_order_number=[$0])
HiveFilter(condition=[IS NOT NULL($0)])
  HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
{code}
Limit n on the right side of the join reduces the result set coming from the 
right to only n record hence not all the ws_order_number values are included 
which leads to correctness issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24081) Enable pre-materializing CTEs referenced in scalar subqueries

2020-08-27 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-24081:
-

 Summary: Enable pre-materializing CTEs referenced in scalar 
subqueries
 Key: HIVE-24081
 URL: https://issues.apache.org/jira/browse/HIVE-24081
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


HIVE-11752 introduces materializing CTE based on config
{code}
hive.optimize.cte.materialize.threshold
{code}
Goal of this jira is
* extending the implementation to support materializing CTE's referenced in 
scalar subqueries
* add a config to materialize CTEs with aggregate output only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23939) SharedWorkOptimizer: taking the union of columns in mergeable TableScans

2020-07-27 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23939:
-

 Summary: SharedWorkOptimizer: taking the union of columns in 
mergeable TableScans
 Key: HIVE-23939
 URL: https://issues.apache.org/jira/browse/HIVE-23939
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
POSTHOOK: query: explain
select case when (select count(*) 
  from store_sales 
  where ss_quantity between 1 and 20) > 409437
then (select avg(ss_ext_list_price) 
  from store_sales 
  where ss_quantity between 1 and 20) 
else (select avg(ss_net_paid_inc_tax)
  from store_sales
  where ss_quantity between 1 and 20) end bucket1 ,
   case when (select count(*)
  from store_sales
  where ss_quantity between 21 and 40) > 4595804
then (select avg(ss_ext_list_price)
  from store_sales
  where ss_quantity between 21 and 40) 
else (select avg(ss_net_paid_inc_tax)
  from store_sales
  where ss_quantity between 21 and 40) end bucket2,
   case when (select count(*)
  from store_sales
  where ss_quantity between 41 and 60) > 7887297
then (select avg(ss_ext_list_price)
  from store_sales
  where ss_quantity between 41 and 60)
else (select avg(ss_net_paid_inc_tax)
  from store_sales
  where ss_quantity between 41 and 60) end bucket3,
   case when (select count(*)
  from store_sales
  where ss_quantity between 61 and 80) > 10872978
then (select avg(ss_ext_list_price)
  from store_sales
  where ss_quantity between 61 and 80)
else (select avg(ss_net_paid_inc_tax)
  from store_sales
  where ss_quantity between 61 and 80) end bucket4,
   case when (select count(*)
  from store_sales
  where ss_quantity between 81 and 100) > 43571537
then (select avg(ss_ext_list_price)
  from store_sales
  where ss_quantity between 81 and 100)
else (select avg(ss_net_paid_inc_tax)
  from store_sales
  where ss_quantity between 81 and 100) end bucket5
from reason
where r_reason_sk = 1
POSTHOOK: type: QUERY
POSTHOOK: Input: default@reason
POSTHOOK: Input: default@store_sales
POSTHOOK: Output: hdfs://### HDFS PATH ###
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 10 <- Reducer 34 (CUSTOM_SIMPLE_EDGE), Reducer 9 (CUSTOM_SIMPLE_EDGE)
Reducer 11 <- Reducer 10 (CUSTOM_SIMPLE_EDGE), Reducer 18 (CUSTOM_SIMPLE_EDGE)
Reducer 12 <- Reducer 11 (CUSTOM_SIMPLE_EDGE), Reducer 24 (CUSTOM_SIMPLE_EDGE)
Reducer 13 <- Reducer 12 (CUSTOM_SIMPLE_EDGE), Reducer 30 (CUSTOM_SIMPLE_EDGE)
Reducer 14 <- Reducer 13 (CUSTOM_SIMPLE_EDGE), Reducer 19 (CUSTOM_SIMPLE_EDGE)
Reducer 15 <- Reducer 14 (CUSTOM_SIMPLE_EDGE), Reducer 25 (CUSTOM_SIMPLE_EDGE)
Reducer 16 <- Reducer 15 (CUSTOM_SIMPLE_EDGE), Reducer 31 (CUSTOM_SIMPLE_EDGE)
Reducer 18 <- Map 17 (CUSTOM_SIMPLE_EDGE)
Reducer 19 <- Map 17 (CUSTOM_SIMPLE_EDGE)
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Reducer 20 (CUSTOM_SIMPLE_EDGE)
Reducer 20 <- Map 17 (CUSTOM_SIMPLE_EDGE)
Reducer 21 <- Map 17 (CUSTOM_SIMPLE_EDGE)
Reducer 22 <- Map 17 (CUSTOM_SIMPLE_EDGE)
Reducer 24 <- Map 23 (CUSTOM_SIMPLE_EDGE)
Reducer 25 <- Map 23 (CUSTOM_SIMPLE_EDGE)
Reducer 26 <- Map 23 (CUSTOM_SIMPLE_EDGE)
Reducer 27 <- Map 23 (CUSTOM_SIMPLE_EDGE)
Reducer 28 <- Map 23 (CUSTOM_SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE), Reducer 26 (CUSTOM_SIMPLE_EDGE)
Reducer 30 <- Map 29 (CUSTOM_SIMPLE_EDGE)
Reducer 31 <- Map 29 (CUSTOM_SIMPLE_EDGE)
Reducer 32 <- Map 29 (CUSTOM_SIMPLE_EDGE)
Reducer 33 <- Map 29 (CUSTOM_SIMPLE_EDGE)
Reducer 34 <- Map 29 (CUSTOM_SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE), Reducer 32 (CUSTOM_SIMPLE_EDGE)
Reducer 5 <- Reducer 21 (CUSTOM_SIMPLE_EDGE), Reducer 4 (CUSTOM_SIMPLE_EDGE)
Reducer 6 <- Reducer 27 (CUSTOM_SIMPLE_EDGE), Reducer 5 (CUSTOM_SIMPLE_EDGE)
Reducer 7 <- Reducer 33 (CUSTOM_SIMPLE_EDGE), Reducer 6 (CUSTOM_SIMPLE_EDGE)
Reducer 8 <- Reducer 22 (CUSTOM_SIMPLE_EDGE), Reducer 7 (CUSTOM_SIMPLE_EDGE)
Reducer 9 <- Reducer 28 (CUSTOM_SIMPLE_EDGE), Reducer 8 (CUSTOM_SIMPLE_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 16
  File Output Operator [FS_154]
Select Operator [SEL_153] (rows=2 width=560)
  Output:["_col0","_col1","_col2","_col3","_col4"]
  Merge Join Operator [MERGEJOIN_185] (rows=2 width=1140)
Conds:(Left 

[jira] [Created] (HIVE-23911) CBO fails when query has distinct in function and having clause

2020-07-23 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23911:
-

 Summary: CBO fails when query has distinct in function and having 
clause
 Key: HIVE-23911
 URL: https://issues.apache.org/jira/browse/HIVE-23911
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table t (col0 int, col1 int);

select col0, count(distinct col1) from t
group by col0
having count(distinct col1) > 1;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23908) Rewrite plan to join back tables: handle root input is an Aggregate

2020-07-23 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23908:
-

 Summary: Rewrite plan to join back tables: handle root input is an 
Aggregate
 Key: HIVE-23908
 URL: https://issues.apache.org/jira/browse/HIVE-23908
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
EXPLAIN CBO
SELECT
C_CUSTOMER_ID
FROM
CUSTOMER
,   STORE_SALES
WHERE
C_CUSTOMER_SK   =   SS_CUSTOMER_SK
GROUP BY
C_CUSTOMER_SK
,   C_CUSTOMER_ID
,   C_FIRST_NAME
,   C_LAST_NAME
,   C_PREFERRED_CUST_FLAG
,   C_BIRTH_COUNTRY
,   C_LOGIN
,   C_EMAIL_ADDRESS
{code}
{code}
HiveProject(c_customer_id=[$1])
  HiveAggregate(group=[{0, 1}])
HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$5], 
$f6=[$6], $f7=[$7])
  HiveJoin(condition=[=($0, $8)], joinType=[inner], algorithm=[none], 
cost=[not available])
HiveProject(c_customer_sk=[$0], c_customer_id=[$1], c_first_name=[$8], 
c_last_name=[$9], c_preferred_cust_flag=[$10], c_birth_country=[$14], 
c_login=[$15], c_email_address=[$16])
  HiveTableScan(table=[[default, customer]], table:alias=[customer])
HiveProject(ss_customer_sk=[$3])
  HiveFilter(condition=[IS NOT NULL($3)])
HiveTableScan(table=[[default, store_sales]], 
table:alias=[store_sales])

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23898) Query fails if identifier contains double quotes or semicolon char

2020-07-22 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23898:
-

 Summary: Query fails if identifier contains double quotes or 
semicolon char
 Key: HIVE-23898
 URL: https://issues.apache.org/jira/browse/HIVE-23898
 Project: Hive
  Issue Type: Bug
  Components: CLI, Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
CREATE TABLE `t;`(a int);
{code}
{code}
CREATE TABLE `t"`(a int);
{code}
{code}
[ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:62 Client execution failed 
with error code = 4 
running 
CREATE TABLE `t 
fname=test.q

See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.parse.ParseException: line 2:15 character '' 
not supported here
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23888) Simplify special_character_in_tabnames_1.q

2020-07-21 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23888:
-

 Summary: Simplify special_character_in_tabnames_1.q
 Key: HIVE-23888
 URL: https://issues.apache.org/jira/browse/HIVE-23888
 Project: Hive
  Issue Type: Task
  Components: Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


* move similar queries to unit tests into the parser module and keep only one 
in the q test.
* use *explain* instead of executing the queries if possible since we are 
focusing on parser testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23750) Rewrite plan to join back tables: support function calls in project

2020-06-23 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23750:
-

 Summary: Rewrite plan to join back tables: support function calls 
in project
 Key: HIVE-23750
 URL: https://issues.apache.org/jira/browse/HIVE-23750
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa


{code}
select
c_first_name || ' ' || c_last_name,
(ss_quantity * ss_list_price) * (1.0 - c_discount),
c_customer_sk,
ss_customer_sk
from store_sales ss
join customer c on ss_customer_sk = c_customer_sk;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23736) Disable topn in ReduceSinkOp if a TNK is introduced

2020-06-22 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23736:
-

 Summary: Disable topn in ReduceSinkOp if a TNK is introduced
 Key: HIVE-23736
 URL: https://issues.apache.org/jira/browse/HIVE-23736
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Both the Reduce Sink and TopNKey operator has Top n key filtering 
functionality. If TNK is introduced this functionality is done twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23706) Fix nulls first sorting behavior

2020-06-16 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23706:
-

 Summary: Fix nulls first sorting behavior
 Key: HIVE-23706
 URL: https://issues.apache.org/jira/browse/HIVE-23706
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23517) Update tpcds queries: q4 q11 q74

2020-05-20 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23517:
-

 Summary: Update tpcds queries: q4 q11 q74
 Key: HIVE-23517
 URL: https://issues.apache.org/jira/browse/HIVE-23517
 Project: Hive
  Issue Type: Task
  Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23493) Rewrite plan to join back tables with many projected columns joined multiple times

2020-05-18 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23493:
-

 Summary: Rewrite plan to join back tables with many projected 
columns joined multiple times
 Key: HIVE-23493
 URL: https://issues.apache.org/jira/browse/HIVE-23493
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23491) Move ParseDriver to parser module

2020-05-18 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23491:
-

 Summary: Move ParseDriver to parser module
 Key: HIVE-23491
 URL: https://issues.apache.org/jira/browse/HIVE-23491
 Project: Hive
  Issue Type: Improvement
  Components: Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Move *ParseDriver* class and syntax Parsing related unit tests to the parse 
module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23406) SharedWorkOptimizer should check nullSortOrders when comparing ReduceSink operators

2020-05-07 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23406:
-

 Summary: SharedWorkOptimizer should check nullSortOrders when 
comparing ReduceSink operators
 Key: HIVE-23406
 URL: https://issues.apache.org/jira/browse/HIVE-23406
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Reporter: Krisztian Kasa


SharedWorkOptimizer does not checks null sort order in ReduceSinkDesc when 
compares to ReduceSink operators:
 
[https://github.com/apache/hive/blob/ca9aba606c4d09b91ee28bf9ee1ae918db8cdfb9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1444]
{code:java}
  ReduceSinkDesc op1Conf = ((ReduceSinkOperator) op1).getConf();
  ReduceSinkDesc op2Conf = ((ReduceSinkOperator) op2).getConf();

  if (StringUtils.equals(op1Conf.getKeyColString(), 
op2Conf.getKeyColString()) &&
StringUtils.equals(op1Conf.getValueColsString(), 
op2Conf.getValueColsString()) &&
StringUtils.equals(op1Conf.getParitionColsString(), 
op2Conf.getParitionColsString()) &&
op1Conf.getTag() == op2Conf.getTag() &&
StringUtils.equals(op1Conf.getOrder(), op2Conf.getOrder()) &&
op1Conf.getTopN() == op2Conf.getTopN() &&
canDeduplicateReduceTraits(op1Conf, op2Conf)) {
return true;
  } else {
return false;
  }
{code}
An expression like
{code:java}
StringUtils.equals(op1Conf.getNullOrder(), op2Conf.getNullOrder()) &&
{code}
should be added.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23372) Project not defined correctly after reordering a join ADDENDUM - fix sharedwork

2020-05-05 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23372:
-

 Summary: Project not defined correctly after reordering a join 
ADDENDUM - fix sharedwork
 Key: HIVE-23372
 URL: https://issues.apache.org/jira/browse/HIVE-23372
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23135) Add RelDistribution trait to HiveSortExchange

2020-04-03 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23135:
-

 Summary: Add RelDistribution trait to HiveSortExchange 
 Key: HIVE-23135
 URL: https://issues.apache.org/jira/browse/HIVE-23135
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23132) Add test of Explain CBO of Merge statements

2020-04-02 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23132:
-

 Summary: Add test of Explain CBO of Merge statements
 Key: HIVE-23132
 URL: https://issues.apache.org/jira/browse/HIVE-23132
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23130) User friendly error message when MV rewriting fails

2020-04-02 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23130:
-

 Summary: User friendly error message when MV rewriting fails
 Key: HIVE-23130
 URL: https://issues.apache.org/jira/browse/HIVE-23130
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


If materialized view rewriting fails because an unsupported SQL clause or 
operator is used we got an error message like this:
{code}
FAILED: SemanticException Cannot enable automatic rewriting for materialized 
view. Unsupported RelNode type HiveSortExchange encountered in the query plan
{code}
This refers to the *HiveSortExchange* operator. This is introduced to CBO plan 
if the statement has *sort by* clause which may not clear for the user.
{code}
create materialized view cmv_mat_view as select a, b, c from cmv_basetable sort 
by a;
{code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23120) TopNKey related tests should be run by TestMiniLlapLocalCliDriver only

2020-04-01 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23120:
-

 Summary: TopNKey related tests should be run by 
TestMiniLlapLocalCliDriver only
 Key: HIVE-23120
 URL: https://issues.apache.org/jira/browse/HIVE-23120
 Project: Hive
  Issue Type: Task
  Components: Physical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


TopNKey optimization is only used when the execution framework is Tez.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23119) Test sort_acid should be run by TestMiniLlapLocalCliDriver only

2020-04-01 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23119:
-

 Summary: Test sort_acid should be run by 
TestMiniLlapLocalCliDriver only
 Key: HIVE-23119
 URL: https://issues.apache.org/jira/browse/HIVE-23119
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23101) Fix topnkey_grouping_sets

2020-03-30 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23101:
-

 Summary: Fix topnkey_grouping_sets
 Key: HIVE-23101
 URL: https://issues.apache.org/jira/browse/HIVE-23101
 Project: Hive
  Issue Type: Sub-task
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Test *topnkey_grouping_sets* fails intermittently.

Queries which project 2 columns but order by only one of them can have more 
than one good result set:
{code}
CREATE TABLE t_test_grouping_sets(
  a int,
  b int,
  c int
);

INSERT INTO t_test_grouping_sets VALUES
(NULL, NULL, NULL),
(5, 2, 3),
(10, 11, 12),
(NULL, NULL, NULL),
(NULL, NULL, NULL),
(6, 2, 1),
(7, 8, 4), (7, 8, 4), (7, 8, 4),
(5, 1, 2), (5, 1, 2), (5, 1, 2),
(NULL, NULL, NULL);

SELECT a, b FROM t_test_grouping_sets GROUP BY GROUPING SETS ((a, b), (a), (b), 
()) ORDER BY a LIMIT 10;
{code}
{code}
5   NULL
5   2
5   1
6   2
6   NULL
7   8
7   NULL
10  NULL
10  11
NULL1
{code}
{code}
5   NULL
5   2
5   1
6   2
6   NULL
7   8
7   NULL
10  NULL
10  11
NULLNULL
{code}
Since we don't order by *b* both result sets are valid.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23094) Implement Explain CBO of Update and Delete statements

2020-03-27 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23094:
-

 Summary: Implement Explain CBO of Update and Delete statements
 Key: HIVE-23094
 URL: https://issues.apache.org/jira/browse/HIVE-23094
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

create table acidtlb(a int, b int) clustered by (a) into 2 buckets stored as 
orc TBLPROPERTIES ('transactional'='true');

explain cbo
update acidtlb set b=777;
{code}
doesn't print CBO plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23089) Add constraint checks to CBO plan

2020-03-27 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-23089:
-

 Summary: Add constraint checks to CBO plan
 Key: HIVE-23089
 URL: https://issues.apache.org/jira/browse/HIVE-23089
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create table acid_uami(i int,
 de decimal(5,2) constraint nn1 not null enforced,
 vc varchar(128) constraint nn2 not null enforced) clustered by 
(i) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
explain
update acid_uami set de=null where i=1;
{code}
Non-CBO path:
{code:java}
Map Operator Tree:
TableScan
alias: acid_uami
filterExpr: ((i = 1) and enforce_constraint(vc is not null)) (type: 
boolean)
Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: ((i = 1) and enforce_constraint(vc is not null)) 
(type: boolean)
{code}
CBO path:
{code:java}
Map Reduce
  Map Operator Tree:
  TableScan
alias: acid_uami
filterExpr: (i = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: (i = 1) (type: boolean)
...
  Reduce Operator Tree:
...
 Filter Operator
predicate: enforce_constraint((null is not null and _col3 is not 
null)) (type: boolean)
{code}

In CBO path the enforce_constraint function is added to the plan when CBO plan 
is already generated and optimized.
{code}
HiveSortExchange(distribution=[any], collation=[[0]])
  HiveProject(row__id=[$5], i=[CAST(1):INTEGER], _o__c2=[null:NULL], vc=[$2])
HiveFilter(condition=[=($0, 1)])
  HiveTableScan(table=[[default, acid_uami]], table:alias=[acid_uami])
{code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22910) CBO fails when subquery with rank left joined

2020-02-18 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22910:
-

 Summary: CBO fails when subquery with rank left joined
 Key: HIVE-22910
 URL: https://issues.apache.org/jira/browse/HIVE-22910
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


*Repro*
{code}
CREATE TABLE table1(a int, b int);

ANALYZE TABLE table1 COMPUTE STATISTICS FOR COLUMNS;


EXPLAIN CBO
SELECT sub1.r FROM
(
SELECT
RANK() OVER (ORDER BY t1.b desc) as r
FROM table1 t1
JOIN table1 t2 ON t1.a = t2.b
) sub1
LEFT OUTER JOIN table1 t3
ON sub1.r = t3.a;
{code}

{code}
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
reference 'b': (possible column names are: $hdt$_0.a, $hdt$_0.b, $hdt$_1.b)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13089)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13031)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12999)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinKeys(SemanticAnalyzer.java:9248)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperator(SemanticAnalyzer.java:9409)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinPlan(SemanticAnalyzer.java:9624)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11781)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11661)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:534)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12547)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:361)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:183)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:594)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:540)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:534)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:249)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:193)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:415)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:346)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:709)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:679)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:169)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 

[jira] [Created] (HIVE-22892) Unable to compile query if CTE joined

2020-02-14 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22892:
-

 Summary: Unable to compile query if CTE joined
 Key: HIVE-22892
 URL: https://issues.apache.org/jira/browse/HIVE-22892
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Repro:
{code}
CREATE TABLE t1 (a int, b varchar(100));

SELECT S.a, t1.a, t1.b FROM (
WITH
 sub1 AS (SELECT a, b FROM t1 WHERE b = 'c')
 SELECT sub1.a, sub1.b FROM sub1
) S
JOIN t1 ON S.a = t1.a;
{code}
{code}
java.lang.AssertionError
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getUnescapedUnqualifiedTableName(BaseSemanticAnalyzer.java:463)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinLogicalPlan(CalcitePlanner.java:2870)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5047)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1787)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1734)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1495)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:471)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12550)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:361)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:286)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:197)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:810)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:756)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:750)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:249)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:193)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:415)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:346)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:709)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:679)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:169)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 

[jira] [Created] (HIVE-22867) Add partitioning support to VectorTopNKeyOperator

2020-02-11 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22867:
-

 Summary: Add partitioning support to VectorTopNKeyOperator 
 Key: HIVE-22867
 URL: https://issues.apache.org/jira/browse/HIVE-22867
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22808) HiveRelFieldTrimmer does not handle HiveTableFunctionScan

2020-01-31 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22808:
-

 Summary: HiveRelFieldTrimmer does not handle HiveTableFunctionScan
 Key: HIVE-22808
 URL: https://issues.apache.org/jira/browse/HIVE-22808
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


*Repro*
{code:java}
CREATE TABLE table_16 (
timestamp_col_19timestamp,
timestamp_col_29timestamp,
int_col_27  int,
int_col_39  int,
boolean_col_18  boolean,
varchar0045_col_23  varchar(45)
);


CREATE TABLE table_7 (
int_col_10  int,
bigint_col_3bigint
);

CREATE TABLE table_10 (
boolean_col_8   boolean,
boolean_col_16  boolean,
timestamp_col_5 timestamp,
timestamp_col_15timestamp,
timestamp_col_30timestamp,
decimal3825_col_26  decimal(38, 25),
smallint_col_9  smallint,
int_col_18  int
);

explain cbo 
SELECT
DISTINCT COALESCE(a4.timestamp_col_15, IF(a4.boolean_col_16, 
a4.timestamp_col_30, a4.timestamp_col_5)) AS timestamp_col
FROM table_7 a3
RIGHT JOIN table_10 a4 
WHERE (a3.bigint_col_3) >= (a4.int_col_18)
INTERSECT ALL
SELECT
COALESCE(LEAST(
COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP))
),
GREATEST(COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS 
TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP)))
) AS timestamp_col
FROM table_16 a1
GROUP BY COALESCE(LEAST(
COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP))
),
GREATEST(
COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP)))
);
{code}
CBO Plan contains unnecessary columns or all columns from a table in 
projections like:
{code:java}
  HiveProject(int_col_10=[$0], bigint_col_3=[$1], 
BLOCK__OFFSET__INSIDE__FILE=[$2], INPUT__FILE__NAME=[$3], 
CAST=[CAST($4):RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid)])
{code}
*Cause*
 The plan contains a HiveTableFunctionScan operator:
{code:java}
HiveTableFunctionScan(invocation=[replicate_rows($0, $1)], 
rowType=[RecordType(BIGINT $f0, TIMESTAMP(9) $f1)])
{code}
HiveTableFunctionScan is not handled by HiveRelFieldTrimmer nor RelFieldTrimmer 
which suppose to remove unused columns in the 
CalcitePlanner.applyPreJoinOrderingTransforms(...) phase. The whole subtree 
rooted from HiveTableFunctionScan is ignored.

Whole plan:
{code:java}
CBO PLAN:
HiveProject($f0=[$1])
  HiveTableFunctionScan(invocation=[replicate_rows($0, $1)], 
rowType=[RecordType(BIGINT $f0, TIMESTAMP(9) $f1)])
HiveProject($f0=[$2], $f1=[$0])
  HiveFilter(condition=[=($1, 2)])
HiveAggregate(group=[{0}], agg#0=[count($1)], agg#1=[min($1)])
  HiveProject($f0=[$0], $f1=[$1])
HiveUnion(all=[true])
  HiveProject($f0=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count()])
  HiveProject($f0=[$0])
HiveAggregate(group=[{0}])
  HiveProject($f0=[CASE(IS NOT NULL($7), $7, if($5, $8, 
$6))])
HiveJoin(condition=[>=($1, $13)], joinType=[inner], 
algorithm=[none], cost=[not available])
  HiveProject(int_col_10=[$0], bigint_col_3=[$1], 
BLOCK__OFFSET__INSIDE__FILE=[$2], INPUT__FILE__NAME=[$3], 
CAST=[CAST($4):RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid)])
HiveFilter(condition=[IS NOT NULL($1)])
  HiveTableScan(table=[[default, table_7]], 
table:alias=[a3])
  HiveProject(boolean_col_16=[$0], 
timestamp_col_5=[$1], timestamp_col_15=[$2], timestamp_col_30=[$3], 
int_col_18=[$4], BLOCK__OFFSET__INSIDE__FILE=[$5], INPUT__FILE__NAME=[$6], 
ROW__ID=[$7], CAST=[CAST($4):BIGINT])
HiveFilter(condition=[IS NOT NULL(CAST($4):BIGINT)])
  HiveTableScan(table=[[default, table_10]], 
table:alias=[a4])
  HiveProject($f0=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count()])
  HiveProject($f0=[$0])
HiveAggregate(group=[{0}])
  HiveProject($f0=[CASE(IS NOT NULL(least(CASE(IS NOT 
NULL($0), $0, 2010-03-29 00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 
2014-08-16 00:00:00:TIMESTAMP(9, least(CASE(IS NOT NULL($0), $0, 2010-03-29 
00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2014-08-16 
00:00:00:TIMESTAMP(9))), greatest(CASE(IS NOT NULL($0), $0, 2013-07-01 
00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2028-06-18 

[jira] [Created] (HIVE-22787) NPE when compiling query contains intersect all

2020-01-28 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22787:
-

 Summary: NPE when compiling query contains intersect all
 Key: HIVE-22787
 URL: https://issues.apache.org/jira/browse/HIVE-22787
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


The query contains INTERSECT ALL operator and one of its operands has an OUTER 
JOIN like:
{code}
SELECT ... FROM t1 RIGHT OUTER JOIN t2 ON ...
INTERSECT ALL
SELECT ...
{code}
In this case both AST trees (before and after calcite) has a TOK_INTERSECTALL 
node and it is not handled
when generating the plan in
{code}
org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
private Operator genPlan(QB parent, QBExpr qbexpr) throws SemanticException {
if (qbexpr.getOpcode() == QBExpr.Opcode.NULLOP) {
  boolean skipAmbiguityCheck = viewSelect == null && 
parent.isTopLevelSelectStarQuery();
  return genPlan(qbexpr.getQB(), skipAmbiguityCheck);
}
if (qbexpr.getOpcode() == QBExpr.Opcode.UNION) {
  Operator qbexpr1Ops = genPlan(parent, qbexpr.getQBExpr1());
  Operator qbexpr2Ops = genPlan(parent, qbexpr.getQBExpr2());

  return genUnionPlan(qbexpr.getAlias(), qbexpr.getQBExpr1().getAlias(),
  qbexpr1Ops, qbexpr.getQBExpr2().getAlias(), qbexpr2Ops);
}
return null;
  }{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22749) ReEnable TopNKey optimization in vectorized q tests

2020-01-19 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22749:
-

 Summary: ReEnable TopNKey optimization in vectorized q tests
 Key: HIVE-22749
 URL: https://issues.apache.org/jira/browse/HIVE-22749
 Project: Hive
  Issue Type: Task
  Components: Physical Optimizer
Reporter: Krisztian Kasa


TopNKey optimization was disabled in the following q tests because the current 
implementation of VectorizedTopNKeyOperator does not support partitioning.

subquery_in.q,subquery_notin.q,vector_windowing_streaming.q,windowing_filter.q



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22735) TopNKey operator deduplication

2020-01-15 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22735:
-

 Summary: TopNKey operator deduplication
 Key: HIVE-22735
 URL: https://issues.apache.org/jira/browse/HIVE-22735
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


In some cases more than one TNK operator has the same expressions in the same 
operator tree or the difference is only a constant column. Most of this cases 
only one TNK op. should remain.

{code}
++
|  Explain   |
++
| Plan not optimized by CBO. |
||
| Vertex dependency in root stage|
| Map 1 <- Reducer 8 (BROADCAST_EDGE)|
| Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Map 6 
(BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) |
| Reducer 3 <- Reducer 2 (SIMPLE_EDGE)   |
| Reducer 4 <- Reducer 3 (SIMPLE_EDGE)   |
| Reducer 8 <- Map 7 (CUSTOM_SIMPLE_EDGE)|
||
| Stage-0|
|   Fetch Operator   |
| limit:50   |
| Stage-1|
|   Reducer 4 vectorized |
|   File Output Operator [FS_127]|
| Limit [LIM_126] (rows=50 width=538)|
|   Number of rows:50|
|   Select Operator [SEL_125] (rows=190 width=538) |
| Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"] |
|   <-Reducer 3 [SIMPLE_EDGE]|
| SHUFFLE [RS_30]|
|   Select Operator [SEL_29] (rows=190 width=538) |
| 
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"] |
| Group By Operator [GBY_28] (rows=190 width=538) |
|   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"],aggregations:["avg(VALUE._col0)","avg(VALUE._col1)","avg(VALUE._col2)","avg(VALUE._col3)"],keys:KEY._col0,
 KEY._col1, KEY._col2 |
| <-Reducer 2 [SIMPLE_EDGE]  |
|   SHUFFLE [RS_27]  |
| PartitionCols:_col0, _col1, _col2 |
| Group By Operator [GBY_26] (rows=190 width=1134) |
|   
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"],aggregations:["avg(_col9)","avg(_col11)","avg(_col18)","avg(_col12)"],keys:_col102,
 _col93, 0L |
|   Top N Key Operator [TNK_60] (rows=127 width=234) |
| keys:_col102, _col93, 0L,top n:50 |
| Select Operator [SEL_25] (rows=127 width=234) |
|   
Output:["_col9","_col11","_col12","_col18","_col93","_col102"] |
|   Top N Key Operator [TNK_58] (rows=127 width=234) |
| keys:_col102, _col93,top n:50 |
| Filter Operator [FIL_49] (rows=127 width=234) |
|   predicate:((_col22 = _col38) and (_col1 = 
_col101) and (_col6 = _col69) and (_col3 = _col26)) |
|   Map Join Operator [MAPJOIN_102] (rows=2044 
width=232) |
| 
Conds:MAPJOIN_101._col1=RS_123.i_item_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26","_col38","_col69","_col93","_col101","_col102"]
 |
|   <-Map 9 [BROADCAST_EDGE] vectorized |
| BROADCAST [RS_123] |
|   PartitionCols:i_item_sk |
|   Filter Operator [FIL_122] (rows=204000 
width=108) |
| predicate:i_item_sk is not null |
| TableScan [TS_4] (rows=204000 width=108) |
|   
tpcds_bin_partitioned_orc_100@item,item, ACID 
table,Tbl:COMPLETE,Col:COMPLETE,Output:["i_item_sk","i_item_id"] |
|   <-Map Join Operator [MAPJOIN_101] (rows=2010 
width=118) |
|   
Conds:MAPJOIN_100._col6=RS_107.s_store_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26","_col38","_col69","_col93"]
 |
| <-Map 7 [BROADCAST_EDGE] vectorized |
|   PARTITION_ONLY_SHUFFLE [RS_107] |
| PartitionCols:s_store_sk |
|  

[jira] [Created] (HIVE-22692) Use only fixDecimalDataTypePhysicalVariations when vectorizing TopNKey operator

2020-01-03 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22692:
-

 Summary: Use only fixDecimalDataTypePhysicalVariations when 
vectorizing TopNKey operator
 Key: HIVE-22692
 URL: https://issues.apache.org/jira/browse/HIVE-22692
 Project: Hive
  Issue Type: Task
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Currently both 'fixDecimalDataTypePhysicalVariations' and 
'getVectorExpressionsUpConvertDecimal64'  are called when vectorizing TopNKey 
operator in 'Vectorizer.java'
{code}
vContext.markActualScratchColumns();
try {
  List keyColumns = topNKeyDesc.getKeyColumns();

  keyExpressions = 
vContext.getVectorExpressionsUpConvertDecimal64(keyColumns);
  fixDecimalDataTypePhysicalVariations(vContext, keyExpressions);

} finally {
  vContext.freeMarkedScratchColumns();
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22666) Introduce TopNKey operator for PTF Reduce Sink

2019-12-20 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22666:
-

 Summary: Introduce TopNKey operator for PTF Reduce Sink
 Key: HIVE-22666
 URL: https://issues.apache.org/jira/browse/HIVE-22666
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22652) TopNKey push through Group by with Grouping sets

2019-12-17 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22652:
-

 Summary: TopNKey push through Group by with Grouping sets
 Key: HIVE-22652
 URL: https://issues.apache.org/jira/browse/HIVE-22652
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22552) Some q tests uses the same name for test tables

2019-11-27 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22552:
-

 Summary: Some q tests uses the same name for test tables
 Key: HIVE-22552
 URL: https://issues.apache.org/jira/browse/HIVE-22552
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Some q tests uses the name "t_test" when creating a test table. This can 
conflict when running the tests parallelly using the same metastore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22544) Disable null sort order at user level

2019-11-26 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22544:
-

 Summary: Disable null sort order at user level
 Key: HIVE-22544
 URL: https://issues.apache.org/jira/browse/HIVE-22544
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 4.0.0
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


"sort order" and "null sort order" in ReduceSinkDesc and TopNKeyDesc should not 
be exposed at user level 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22508) KeyWrapperComparator throws exception

2019-11-18 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22508:
-

 Summary: KeyWrapperComparator throws exception
 Key: HIVE-22508
 URL: https://issues.apache.org/jira/browse/HIVE-22508
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


TopNKeyOperator.KeyWrapperComparator throws exception when a new key and a 
copied key should be compared.

The current implementation uses the standard object inspectors for all 
KeyWrapper instances. However when comparing untouched KeyWrappers the key 
object inspector should be used which can be extracted form 
Operator.inputObjectInspectors during initialization of the key's 
ExprNodeEvaluator. 

This can cause a ClassCastException when the comparator is used collections 
like TreeSet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22507) KeyWrapper comparator create subcomparator instances every comparison

2019-11-18 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22507:
-

 Summary: KeyWrapper comparator create subcomparator instances 
every comparison 
 Key: HIVE-22507
 URL: https://issues.apache.org/jira/browse/HIVE-22507
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


TopNKeyOperator.KeyWrapperComparator uses separate comparators for each field 
of the keys. Every time when TopNKeyOperator.KeyWrapperComparator.compare is 
called a new instance of the field operators are created.

The field comparators should be created prior any comparison. Probably in the 
constructor of KeyWrapperComparator or during initializeOp.

https://github.com/apache/hive/blob/fc81b8909b1a0e6aa15900387a98bccf38ae2247/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L964
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22489) Reduce Sink orders nulls first

2019-11-13 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22489:
-

 Summary: Reduce Sink orders nulls first
 Key: HIVE-22489
 URL: https://issues.apache.org/jira/browse/HIVE-22489
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When the property hive.default.nulls.last is set to true and no null order is 
explicitly specified in the ORDER BY clause of the query null ordering should 
be NULLS LAST.
But some of the Reduce Sink operators still orders null first.
{code}
SET hive.default.nulls.last=true;

EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key LIMIT 5;
{code}

{code}
PREHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key
PREHOOK: type: QUERY
PREHOOK: Input: default@src
 A masked pattern was here 
POSTHOOK: query: EXPLAIN EXTENDED
SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = 
src2.key) ORDER BY src1.key
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
 A masked pattern was here 
OPTIMIZED SQL: SELECT `t0`.`key`, `t2`.`value`
FROM (SELECT `key`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t0`
INNER JOIN (SELECT `key`, `value`
FROM `default`.`src`
WHERE `key` IS NOT NULL) AS `t2` ON `t0`.`key` = `t2`.`key`
ORDER BY `t0`.`key`
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
 A masked pattern was here 
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: src1
  filterExpr: key is not null (type: boolean)
  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
  GatherStats: false
  Filter Operator
isSamplingPred: false
predicate: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
Select Operator
  expressions: key (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
tag: 0
auto parallelism: true
Execution mode: vectorized, llap
LLAP IO: no inputs
Path -> Alias:
 A masked pattern was here 
Path -> Partition:
 A masked pattern was here 
Partition
  base file name: src
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  properties:
COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns key,value
columns.comments 'default','default'
columns.types string:string
 A masked pattern was here 
name default.src
numFiles 1
numRows 500
rawDataSize 5312
serialization.ddl struct src { string key, string value}
serialization.format 1
serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 5812
 A masked pattern was here 
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
  bucket_count -1
  bucketing_version 2
  column.name.delimiter ,
  columns key,value
  

[jira] [Created] (HIVE-22481) Expose null sort order at default level

2019-11-12 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22481:
-

 Summary: Expose null sort order at default level
 Key: HIVE-22481
 URL: https://issues.apache.org/jira/browse/HIVE-22481
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22464) Implement support for NULLS FIRST/LAST in TopNKeyOperator

2019-11-06 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22464:
-

 Summary: Implement support for NULLS FIRST/LAST in TopNKeyOperator
 Key: HIVE-22464
 URL: https://issues.apache.org/jira/browse/HIVE-22464
 Project: Hive
  Issue Type: Improvement
  Components: Physical Optimizer
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22435) Exception when using VectorTopNKeyOperator operator

2019-10-30 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22435:
-

 Summary: Exception when using VectorTopNKeyOperator operator
 Key: HIVE-22435
 URL: https://issues.apache.org/jira/browse/HIVE-22435
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa


Steps to reproduce:
1. Apply the attached patch
{code}
git apply -3 -p0 HIVE-20150.15.patch
{code}
2. rebuild project
{code}
mvn clean install -DskipTests
{code}
3. Run the following test
{code}
mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=limit_pushdown3.q -pl itests/qtest -Pitests
{code}

Query execution fails with exception
{code}
select ctinyint, count(distinct(cdouble)) from alltypesorc group by ctinyint 
order by ctinyint limit 20
{code}
{code}
[ERROR] Failures: 
[ERROR]   TestMiniLlapLocalCliDriver.testCliDriver:59 Client execution failed 
with error code = 2 
running 
select ctinyint, count(distinct(cdouble)) from alltypesorc group by ctinyint 
order by ctinyint limit 20 
fname=limit_pushdown3.q

See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
vertexName=Reducer 2, vertexId=vertex_1572454329409_0001_9_01, 
diagnostics=[Task failed, taskId=task_1572454329409_0001_9_01_00, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1572454329409_0001_9_01_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: cannot find field key from [0:key._col0, 
1:key._col1]
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: cannot find field key from [0:key._col0, 
1:key._col1]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:538)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:153)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:80)
at 
org.apache.hadoop.hive.ql.exec.TopNKeyOperator.initializeOp(TopNKeyOperator.java:106)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorTopNKeyOperator.initializeOp(VectorTopNKeyOperator.java:71)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:191)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
... 15 more
], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
attempt_1572454329409_0001_9_01_00_1:java.lang.RuntimeException: 
java.lang.RuntimeException: cannot find field key from [0:key._col0, 
1:key._col1]
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at 

[jira] [Created] (HIVE-22346) Yetus is failing rat check

2019-10-15 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22346:
-

 Summary: Yetus is failing rat check
 Key: HIVE-22346
 URL: https://issues.apache.org/jira/browse/HIVE-22346
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa


{code:java}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18996/standalone-metastore/metastore-server/src/test/resources/ldap/ad.example.com.ldif
 !? 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18996/standalone-metastore/metastore-server/src/test/resources/ldap/example.com.ldif
 !? 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18996/standalone-metastore/metastore-server/src/test/resources/ldap/microsoft.schema.ldif
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22321) Setting default nulls last does not take effect when order direction is specified

2019-10-10 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22321:
-

 Summary: Setting default nulls last does not take effect when 
order direction is specified
 Key: HIVE-22321
 URL: https://issues.apache.org/jira/browse/HIVE-22321
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
SET hive.default.nulls.last=true;
SELECT * FROM t_test ORDER BY col1 ASC;
{code}
{code}
POSTHOOK: query: SELECT * FROM t_test ORDER BY col1 ASC
POSTHOOK: type: QUERY
POSTHOOK: Input: default@t_test
 A masked pattern was here 
NULL
NULL
NULL
NULL
3
5
5
{code}

https://github.com/apache/hive/blob/cb83da943c8919e2ab3751244de5c2879c8fda1d/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g#L2510



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22296) Set NULLS LAST as the default null ordering in Sorted by clause

2019-10-07 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22296:
-

 Summary: Set NULLS LAST as the default null ordering in Sorted by 
clause
 Key: HIVE-22296
 URL: https://issues.apache.org/jira/browse/HIVE-22296
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa


1. In HiveParser.g use nullsLast() function to determine default null ordering. 
This was reverted by (HIVE-22281)
2. store null ordering in metastore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22295) Javadoc for UDAFs

2019-10-07 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22295:
-

 Summary: Javadoc for UDAFs
 Key: HIVE-22295
 URL: https://issues.apache.org/jira/browse/HIVE-22295
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Krisztian Kasa


{code}
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java:33:@Description(:
 warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java:24:@Description(:
 warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFFirstValue.java:44:@Description(:
 warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:35:@Description(:
 warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLastValue.java:40:@Description(name
 = "last_value", value = "_FUNC_(x)"): warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLead.java:31:@Description(:
 warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFNTile.java:41:@Description(:
 warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java:34:@Description(:
 warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java:41:@Description(:
 warning: Missing a Javadoc comment.
./ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRowNumber.java:40:@Description(:
 warning: Missing a Javadoc comment.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22292) Implement Hypothetical-Set Aggregate Functions

2019-10-04 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22292:
-

 Summary: Implement Hypothetical-Set Aggregate Functions
 Key: HIVE-22292
 URL: https://issues.apache.org/jira/browse/HIVE-22292
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


{code}
 ::=
   
   
  

 ::=
  RANK
  | DENSE_RANK
  | PERCENT_RANK
  | CUME_DIST
{code}
Example:
{code}
CREATE TABLE table1 (column1 int);
INSERT INTO table1 VALUES (NULL), (3), (8), (13), (7), (6), (20), (NULL), 
(NULL), (10), (7), (15), (16), (8), (7), (8), (NULL);
{code}
{code}
SELECT rank(6) WITHIN GROUP (ORDER BY column1) FROM table1;
{code}
{code}
2
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22281) Create table statement fails with "not supported NULLS LAST for ORDER BY in ASC order"

2019-10-02 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22281:
-

 Summary: Create table statement fails with "not supported NULLS 
LAST for ORDER BY in ASC order"
 Key: HIVE-22281
 URL: https://issues.apache.org/jira/browse/HIVE-22281
 Project: Hive
  Issue Type: Bug
  Components: Parser
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


{code}
CREATE TABLE table_core2c4ywq7yjx ( k1 STRING, f1 STRING, 
sequence_num BIGINT, create_bsk BIGINT, change_bsk BIGINT, op_code 
STRING ) PARTITIONED BY (run_id BIGINT) CLUSTERED BY (k1) SORTED BY (k1, 
change_bsk, sequence_num) INTO 4 BUCKETS STORED AS ORC
{code}
{code}
Error while compiling statement: FAILED: SemanticException create/alter table: 
not supported NULLS LAST for ORDER BY in ASC order
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22250) Describe function does not provide description for rank functions

2019-09-27 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22250:
-

 Summary: Describe function does not provide description for rank 
functions
 Key: HIVE-22250
 URL: https://issues.apache.org/jira/browse/HIVE-22250
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22240) Function percentile_cont fails when array parameter passed

2019-09-24 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22240:
-

 Summary: Function percentile_cont fails when array parameter passed
 Key: HIVE-22240
 URL: https://issues.apache.org/jira/browse/HIVE-22240
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


{code}
SELECT
percentile_cont(array(0.2, 0.5, 0.9)) WITHIN GROUP (ORDER BY value)
FROM t_test;
{code}

hive.log:
{code}
2019-09-24T21:00:43,203 ERROR [LocalJobRunner Map Task Executor #0] 
mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error while processing row
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:793)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
... 11 more
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileCont$PercentileContEvaluator.iterate(GenericUDAFPercentileCont.java:259)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:214)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:639)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:814)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:720)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:788)
... 17 more

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22162) MVs are not using ACID tables.

2019-08-30 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22162:
-

 Summary: MVs are not using ACID tables.
 Key: HIVE-22162
 URL: https://issues.apache.org/jira/browse/HIVE-22162
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Affects Versions: 3.1.2
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


{code}
SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

SET metastore.strict.managed.tables=true;
SET hive.default.fileformat=textfile;
SET hive.default.fileformat.managed=orc;

SET metastore.create.as.acid=true;

CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2));

INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, 
'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8);

CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite
AS SELECT a, b, c FROM cmv_basetable_n4;

DESCRIBE FORMATTED cmv_mat_view_n4;
{code}

{code}
POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4
...
Table Type: MATERIALIZED_VIEW
Table Parameters:
COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}}
bucketing_version   2   
numFiles1   
numRows 5   
rawDataSize 1025
totalSize   509   
{code}

Missing table parameter
{code}
transaction = true
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HIVE-21965) Implement parallel processing in HiveStrictManagedMigration

2019-07-08 Thread Krisztian Kasa (JIRA)
Krisztian Kasa created HIVE-21965:
-

 Summary: Implement parallel processing in 
HiveStrictManagedMigration
 Key: HIVE-21965
 URL: https://issues.apache.org/jira/browse/HIVE-21965
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 3.1.0
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


This process, kicked off from Ambari can take many days for systems with 1000's 
of tables. The process needs to support parallel execution as it iterates 
through the Databases and Tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21948) Implement parallell processing in Pre Upgrade Tool

2019-07-03 Thread Krisztian Kasa (JIRA)
Krisztian Kasa created HIVE-21948:
-

 Summary: Implement parallell processing in Pre Upgrade Tool
 Key: HIVE-21948
 URL: https://issues.apache.org/jira/browse/HIVE-21948
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.1.0
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


Pre Upgrade Tool scans for all databases and tables in the warehouse 
sequentially which can be very slow in case of lots of tables.

Example: It took the process 8-10 hours to complete on ~500k tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21941) Use checkstyle ruleset in Pre Upgrade Tool project

2019-07-01 Thread Krisztian Kasa (JIRA)
Krisztian Kasa created HIVE-21941:
-

 Summary: Use checkstyle ruleset in Pre Upgrade Tool project 
 Key: HIVE-21941
 URL: https://issues.apache.org/jira/browse/HIVE-21941
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.0
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


The project upgrade-acid/pre-upgrade does not uses the same checkstyle ruleset 
as hive root project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21938) Add database and table filter options to PreUpgradeTool

2019-06-30 Thread Krisztian Kasa (JIRA)
Krisztian Kasa created HIVE-21938:
-

 Summary: Add database and table filter options to PreUpgradeTool
 Key: HIVE-21938
 URL: https://issues.apache.org/jira/browse/HIVE-21938
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.0
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
 Fix For: 4.0.0


By default pre upgrade tool scans all databases and tables in the warehouse. 
Add database and table filter options to run the tool for a specific subset of 
databases and tables only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)