Re: Review Request 72675: Improve key evictions in VectorGroupByOperator

2020-07-20 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72675/
---

(Updated July 21, 2020, 2:10 a.m.)


Review request for hive, Ashutosh Chauhan and Nita Dembla.


Changes
---

Got rid of LinkedHashMap and simplified the patch.


Repository: hive-git


Description
---

HIVE-23843: Improve key evictions in VectorGroupByOperator


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorAggregationBufferRow.java
 494db35b97 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java 
9f81e8edfd 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorGroupByOperator.java
 3835987170 


Diff: https://reviews.apache.org/r/72675/diff/2/

Changes: https://reviews.apache.org/r/72675/diff/1-2/


Testing
---


Thanks,

Rajesh Balamohan



[jira] [Created] (HIVE-23886) Filter Query on External table produce no result if hive.metastore.expression.proxy set to MsckPartitionExpressionProxy

2020-07-20 Thread Rajkumar Singh (Jira)
Rajkumar Singh created HIVE-23886:
-

 Summary: Filter Query on External table produce no result if 
hive.metastore.expression.proxy set to MsckPartitionExpressionProxy
 Key: HIVE-23886
 URL: https://issues.apache.org/jira/browse/HIVE-23886
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.0
Reporter: Rajkumar Singh


query such as "select count(1) from tpcds_10_parquet.store_returns where 
sr_returned_date_sk=2452802" return row count as 0 even though partition has 
enough rows in it.

upon investigation, I found that partition list  passed during the 
StatsUtils.getNumRows is of zero size.
https://github.com/apache/hive/blob/ccaf783a198e142b408cb57415c4262d27b45831/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L438-L439

it seems partitionlist is retrieved during PartitionPruner
https://github.com/apache/hive/blob/36bf7f00731e3b95af3e5eeaa4ce39b375974a74/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L439

Hive serialized this filter expression using Kryo before passing to HMS

https://github.com/apache/hive/blob/36bf7f00731e3b95af3e5eeaa4ce39b375974a74/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3931

on the server-side if the hive.metastore.expression.proxy set to 
MsckPartitionExpressionProxy it tries to convert this expression into the 
string 

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java#L50

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java#L56

because of this bad filter expression hive did not retrieve any partitions, I 
think to make it work hive should try to deserialize it similar to 
PartitionExpressionForMetastore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23885) Remove Hive on Spark

2020-07-20 Thread David Mollitor (Jira)
David Mollitor created HIVE-23885:
-

 Summary: Remove Hive on Spark
 Key: HIVE-23885
 URL: https://issues.apache.org/jira/browse/HIVE-23885
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23884) SemanticAnalyze exception when addressing field with table name in group by

2020-07-20 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-23884:
---

 Summary: SemanticAnalyze exception when addressing field with 
table name in group by
 Key: HIVE-23884
 URL: https://issues.apache.org/jira/browse/HIVE-23884
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan


{noformat}
explain cbo 
select  `item`.`i_item_id`,
`store`.`s_state`, grouping(s_state) `g_state` from  
`tpcds_bin_partitioned_orc_1`.`store`, 
`tpcds_bin_partitioned_orc_1`.`item`
where `store`.`s_state` in ('AL','IN', 'SC', 'NY', 'OH', 'FL')
group by rollup (`item`.`i_item_id`, `s_state`)

CBO PLAN:


HiveProject(i_item_id=[$0], s_state=[$1], g_state=[grouping($2, 0:BIGINT)])
  HiveAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], 
GROUPING__ID=[GROUPING__ID()])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not 
available])
  HiveProject(i_item_id=[$1])
HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, item]], 
table:alias=[item])
  HiveProject(s_state=[$24])
HiveFilter(condition=[IN($24, _UTF-16LE'AL', _UTF-16LE'IN', 
_UTF-16LE'SC', _UTF-16LE'NY', _UTF-16LE'OH', _UTF-16LE'FL')])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, store]], 
table:alias=[store])
{noformat}
 

However, adding fully qualified field name "*`store`.`s_state`*"" in the second 
rollup throws SemanticAnalyzer exception

 
{noformat}
explain cbo 
select  `item`.`i_item_id`,
`store`.`s_state`, grouping(s_state) `g_state` from  
`tpcds_bin_partitioned_orc_1`.`store`, 
`tpcds_bin_partitioned_orc_1`.`item`
where `store`.`s_state` in ('AL','IN', 'SC', 'NY', 'OH', 'FL')
group by rollup (`item`.`i_item_id`, `store`.`s_state`)

Error: Error while compiling statement: FAILED: RuntimeException [Error 10409]: 
Expression in GROUPING function not present in GROUP BY (state=42000,code=10409)

{noformat}
Exception: based on 3.x; but mostly should occur in master as well.

Related ticket: https://issues.apache.org/jira/browse/HIVE-15996
{noformat}
Caused by: java.lang.RuntimeException: Expression in GROUPING function not 
present in GROUP BY
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$2.post(SemanticAnalyzer.java:3296)
 ~[hive-exec-3.1xyz]
at org.antlr.runtime.tree.TreeVisitor.visit(TreeVisitor.java:66) 
~[antlr-runtime-3.5.2.jar:3.5.2]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteGroupingFunctionAST(SemanticAnalyzer.java:3305)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4616)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4392)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11026)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10965)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11894)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11764)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12568)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:707)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12669)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:426)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:170)
 ~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
 ~[hive-exec-3.1xyz]
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221) 
~[hive-exec-3.1xyz]
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
~[hive-exec-3.1xyz]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188) 
~[hive-exec-3.1xyz]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:598) 
~[hive-exec-3.1xyz]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:544) 
~[hive-exec-3.1xyz]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:538) 
~[hive-exec-3.1xyz]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
 ~[hive-exec-3.1xyz
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23883) Streaming does not flush the side file

2020-07-20 Thread Peter Vary (Jira)
Peter Vary created HIVE-23883:
-

 Summary: Streaming does not flush the side file
 Key: HIVE-23883
 URL: https://issues.apache.org/jira/browse/HIVE-23883
 Project: Hive
  Issue Type: Bug
  Components: Streaming, Transactions
Reporter: Peter Vary


When a streaming write commits a mid-batch write with 
{{connection.commitTransaction()}} then it tries to flush the sideFile with 
{{OrcInputFormat.SHIMS.hflush(flushLengths)}}. This uses FSOutputSummer.flush, 
which does not flush the buffer data to the disk so the actual data is not 
written.

Had to remove the check from the end of the streaming tests in 
{{TestCrudCompactorOnTez.java}}
{code:java}
  CompactorTestUtilities.checkAcidVersion(fs.listFiles(new 
Path(table.getSd().getLocation()), true), fs,
  conf.getBoolVar(HiveConf.ConfVars.HIVE_WRITE_ACID_VERSION_FILE),
  new String[] { AcidUtils.DELTA_PREFIX });
{code}
These checks verifies the {{_flush_length}} files, and they would fail 
otherwise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23882) Compiler should skip MJ keyExpr for probe optimization

2020-07-20 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23882:
-

 Summary: Compiler should skip MJ keyExpr for probe optimization
 Key: HIVE-23882
 URL: https://issues.apache.org/jira/browse/HIVE-23882
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


In probe we cannot currently support Key expressions (on the big table Side) as 
ORC CVs Probe directly the smalltable HT (there is no expr evaluation at that 
level).

TezCompiler should take this into account when picking MJs to push probe details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23881) Deprecate get_open_txns to use get_open_txns_req method.

2020-07-20 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-23881:
--

 Summary: Deprecate get_open_txns to use get_open_txns_req method.
 Key: HIVE-23881
 URL: https://issues.apache.org/jira/browse/HIVE-23881
 Project: Hive
  Issue Type: Task
Reporter: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23879) Data has been lost after table location was altered

2020-07-20 Thread Demyd (Jira)
Demyd created HIVE-23879:


 Summary: Data has been lost after table location was altered
 Key: HIVE-23879
 URL: https://issues.apache.org/jira/browse/HIVE-23879
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Demyd


When I alter location for not empty table and inserts data to it. I don't see 
old data at work with hs2. But I can find there in maprfs by old table location.

Steps to reproduce:
1. connect to hs2 by beeline"
hive --service beeline -u "jdbc:hive2://:1/;"2. create test db:
create database dbtest1 location 'hdfs:///dbtest1.db';3. create test table:
create table dbtest1.t1 (id int);

4. insert data to table:
insert into dbtest1.t1 (id) values (1);

5. set new table location:
alter table dbtest1.t1 set location 'hdfs:///dbtest1a/t1';6. insert data to 
table:
insert into dbtest1.t1 (id) values (2);



Actual result:
jdbc:hive2://:> select * from dbtest1.t1;++
| t1.id  |
++
| 2  |
++
1 row selected (0.097 seconds)



Expected result:
jdbc:hive2://:> select * from dbtest1.t1;++
| t1.id  |
++
| 2  |
++
| 1  |
++
1 row selected (0.097 seconds)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-07-20 Thread Jira
László Bodor created HIVE-23880:
---

 Summary: Bloom filters can be merged in a parallel way in 
VectorUDAFBloomFilterMerge
 Key: HIVE-23880
 URL: https://issues.apache.org/jira/browse/HIVE-23880
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: 【Hive Alter Table Add column at specified position】

2020-07-20 Thread Rui Li
Yeah, according to our DDL doc, we don't support this use case at the
moment. Perhaps you can use REPLACE COLUMNS as a workaround.

On Sat, Jun 27, 2020 at 5:32 PM 忝忝向仧 <153488...@qq.com> wrote:

> Hi,all:
>
>
> It seems that Hive can not alter table to add column atspecified
> position.
> For instance,the Table A has c1,c2,c3 columns,and i want to add column c4
> after c1,therefore,the table would be like c1,c4,c2,c3 instead of
> c1,c2,c3,c4.
>
>
> Thanks.



-- 
Best regards!
Rui Li