[Announce] New committer : Denys Kuzmenko

2020-02-14 Thread Ashutosh Chauhan
Apache Hive's Project Management Committee (PMC) has invited Denys Kuzmenko
to become a committer, and we are pleased to announce that he has accepted.

Denys welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Thanks,
Ashutosh


Re: Review Request 71962: HIVE-22699: Fix mask functions not masking numeric 0

2020-02-14 Thread Quanlong Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71962/
---

(Updated Feb. 15, 2020, 12:06 a.m.)


Review request for hive and Raj Kumar.


Repository: hive-git


Description
---

For numeric value 0 (tinyint, smallint, int, bigint), the mask functions are 
not masking it correctly. It should be replaced as 'numberChat' if masked.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java 27c3bf8 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskFirstN.java 
76ee292 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskLastN.java 
c0c5c61 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskShowFirstN.java 
a8f70f2 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskShowLastN.java 
c72d75e 
  ql/src/test/queries/clientpositive/udf_mask.q 15f7d27 
  ql/src/test/queries/clientpositive/udf_mask_first_n.q 3cd3962 
  ql/src/test/queries/clientpositive/udf_mask_last_n.q 89eb05d 
  ql/src/test/queries/clientpositive/udf_mask_show_first_n.q 1425a82 
  ql/src/test/queries/clientpositive/udf_mask_show_last_n.q c4d15fb 
  ql/src/test/results/clientpositive/udf_mask.q.out ed01449 
  ql/src/test/results/clientpositive/udf_mask_first_n.q.out e33fb42 
  ql/src/test/results/clientpositive/udf_mask_last_n.q.out 07a254f 
  ql/src/test/results/clientpositive/udf_mask_show_first_n.q.out 3ec3270 
  ql/src/test/results/clientpositive/udf_mask_show_last_n.q.out 4dd42fd 


Diff: https://reviews.apache.org/r/71962/diff/2/


Testing
---


Thanks,

Quanlong Huang



Re: Review Request 72113: DML execution on TEZ always outputs the message 'No rows affected'

2020-02-14 Thread Laszlo Bodor


> On Feb. 14, 2020, 3:23 p.m., Panos Garefalakis wrote:
> > +1 from me as well
> > As discussed the extra 0s in the query out are caused by the 
> > counters.findCounter() that initializes an empty counter when reading a non 
> > existent counter -- then all HIVE_COUNTERS are printed by 
> > PostExecTezSummaryPrinter

I like this patch, and I would be happy to see a simple unit test (which would 
be an integration test actually) which is about doing a simple update and 
checking the improved output, I believe we have similar unit tests already


- Laszlo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72113/#review219591
---


On Feb. 13, 2020, 3:40 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72113/
> ---
> 
> (Updated Feb. 13, 2020, 3:40 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Mustafa Iman, Panos Garefalakis, and 
> Ramesh Kumar Thangarajan.
> 
> 
> Bugs: HIVE-22870
> https://issues.apache.org/jira/browse/HIVE-22870
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Executing an update or insert statement in beeline doesn't show the actual 
> rows inserted/updated.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 25dd970a9b1 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 9c5695ae603 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out 
> f9b5f8f0d4d 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 9ad0a9b7faf 
>   ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
> 3e99e0ee627 
>   ql/src/test/results/clientpositive/llap/retry_failure_reorder.q.out 
> baeac434d79 
>   ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
> 885cb0a9cba 
> 
> 
> Diff: https://reviews.apache.org/r/72113/diff/2/
> 
> 
> Testing
> ---
> 
> with insert and updates
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



[jira] [Created] (HIVE-22894) Filter on subquery with GROUP BY returns wrong column

2020-02-14 Thread Igor Dvorzhak (Jira)
Igor Dvorzhak created HIVE-22894:


 Summary: Filter on subquery with GROUP BY returns wrong column
 Key: HIVE-22894
 URL: https://issues.apache.org/jira/browse/HIVE-22894
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.3.6
Reporter: Igor Dvorzhak


Reproduction steps:
{code:java}
$ echo -e "02/11/20,C_A,C_A_B\n02/11/20,C_A,C_A_C" | hadoop fs -put - 
/user/hive/warehouse/test/data.csv

$ hive

> CREATE TABLE test(date_str STRING, category STRING, subcategory STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
OK
Time taken: 0.877 seconds

> SELECT DISTINCT category FROM (SELECT date_str, category, subcategory FROM 
> test HERE date_str='02/11/20' GROUP BY date_str, category, subcategory) AS t 
> WHERE t.category='C_A';
OK
C_A_B
C_A_C
Time taken: 9.108 seconds, Fetched: 2 row(s)

> EXPLAIN SELECT DISTINCT category FROM (SELECT date_str, category, subcategory 
> FROM test WHERE date_str='02/11/20' GROUP BY date_str, category, subcategory) 
> AS t WHERE t.category='C_A';
OK
Plan optimized by CBO.Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 2
  File Output Operator [FS_12]
Group By Operator [GBY_10] (rows=1 width=38)
  Output:["_col0"],keys:_col0
  Group By Operator [GBY_5] (rows=1 width=38)
Output:["_col0"],keys:KEY._col0
  <-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_4]
  PartitionCols:_col0
  Group By Operator [GBY_3] (rows=1 width=38)
Output:["_col0"],keys:subcategory
Select Operator [SEL_2] (rows=1 width=38)
  Output:["subcategory"]
  Filter Operator [FIL_13] (rows=1 width=38)
predicate:((date_str = '02/11/20') and (category = 'C_A'))
TableScan [TS_0] (rows=1 width=38)
  
default@test,test,Tbl:COMPLETE,Col:NONE,Output:["date_str","category","subcategory"]
Time taken: 0.21 seconds, Fetched: 27 row(s)
{code}
 

It works as expected with disabled CBO:

{code:java}
> SET hive.cbo.enable=false;

> SELECT DISTINCT category FROM (SELECT date_str, category, subcategory FROM 
> test WHERE date_str='02/11/20' GROUP BY date_str, category, subcategory) AS t 
> WHERE t.category='C_A';
OK
C_A
Time taken: 13.948 seconds, Fetched: 1 row(s)

> EXPLAIN SELECT DISTINCT category FROM (SELECT date_str, category, subcategory 
> FROM test WHERE date_str='02/11/20' GROUP BY date_str, category, subcategory) 
> AS t WHERE t.category='C_A';
OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 3
  File Output Operator [FS_13]
Select Operator [SEL_12] (rows=1 width=38)
  Output:["_col0"]
  Group By Operator [GBY_11] (rows=1 width=38)
Output:["_col0"],keys:'C_A'
  <-Reducer 2 [SIMPLE_EDGE]
SHUFFLE [RS_10]
  PartitionCols:'C_A'
  Group By Operator [GBY_9] (rows=1 width=38)
Output:["_col0"],keys:'C_A'
Select Operator [SEL_6] (rows=1 width=38)
  Group By Operator [GBY_5] (rows=1 width=38)
Output:["_col0","_col1","_col2"],keys:'02/11/20', 'C_A', 
KEY._col2
  <-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_4]
  PartitionCols:'02/11/20', 'C_A', _col2
  Group By Operator [GBY_3] (rows=1 width=38)
Output:["_col0","_col1","_col2"],keys:'02/11/20', 
'C_A', subcategory
Select Operator [SEL_2] (rows=1 width=38)
  Output:["subcategory"]
  Filter Operator [FIL_14] (rows=1 width=38)
predicate:((date_str = '02/11/20') and (category = 
'C_A'))
TableScan [TS_0] (rows=1 width=38)
  
default@test,test,Tbl:COMPLETE,Col:NONE,Output:["date_str","category","subcategory"]
Time taken: 0.065 seconds, Fetched: 34 row(s){code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22893) Enhance data size estimation for fields computed by UDFs

2020-02-14 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-22893:
---

 Summary: Enhance data size estimation for fields computed by UDFs
 Key: HIVE-22893
 URL: https://issues.apache.org/jira/browse/HIVE-22893
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


Right now if we have columnstat on a coumn ; we use that to estimate things 
about the column; - however if an UDF is executed on a column ; the resulting 
column is treated as unknown thing and defaults are assumed.

An impovement could be to give wide estimation(s) in case of frequently used 
udf.

For example; consider {{substr(c,1,1)}} ; no matter what the input; the output 
is at most a 1 long string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-14 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/
---

(Updated Feb. 14, 2020, 4:10 p.m.)


Review request for hive, Gopal V and Peter Vary.


Changes
---

Addressed the review findings


Bugs: HIVE-21164
https://issues.apache.org/jira/browse/HIVE-21164


Repository: hive-git


Description
---

Extended the original patch with saving the task attempt ids in the file names 
and also fixed some bugs in the original patch.
With this fix, inserting into an ACID table would not use move task to place 
the generated files into the final directory. It will inserts every files to 
the final directory and then clean up the files which are not needed (like 
written by failed task attempts).
Also fixed the replication tests which failed for the original patch as well.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2f695d4acc 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 da677c7977 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
056cd27496 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
 31d15fdef9 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
 c2aa73b5f1 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
 4c0137 
  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
9a3258115b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebee82 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6c67bc7dd8 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960102 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb223f2 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 2f5ec5270c 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 8980a6292a 
  ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984abd0a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c4c56f8477 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
b8a0f0465c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 398698ec06 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 2543dc6fc4 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 945eafc034 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
73ca658d9c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 33d3beba46 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
c102a69f8f 
  ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java ecc7bdee4d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
bb70db4524 
  ql/src/java/org/apache/hadoop/hive/ql/util/UpgradeTool.java 58e6289583 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnAddPartition.java c9cb6692df 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java 842140815d 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 88ca683173 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 908ceb43fc 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnConcatenate.java 8676e0db11 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java 66b2b2768b 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java bb55d9fd79 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java ea6b1d9bec 
  ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java af14e628b3 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 83db48e758 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java 
2c4b69b2fe 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
48e9afc496 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java 
cfd7290762 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java 
70ae85c458 
  ql/src/test/queries/clientpositive/tez_acid_union_dynamic_partition.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/tez_acid_union_dynamic_partition_2.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/tez_acid_union_multiinsert.q PRE-CREATION 
  ql/src/test/results/clientpositive/acid_subquery.q.out 1dc1775557 
  ql/src/test/results/clientpositive/create_transactional_full_acid.q.out 
e324d5ec43 
  
ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_dynamic.q.out
 61b0057adb 
  ql/src/test/results/clientpositive/llap/acid_no_buckets.q.out fbf4e481f1 
  

[jira] [Created] (HIVE-22892) Unable to compile query if CTE joined

2020-02-14 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-22892:
-

 Summary: Unable to compile query if CTE joined
 Key: HIVE-22892
 URL: https://issues.apache.org/jira/browse/HIVE-22892
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Repro:
{code}
CREATE TABLE t1 (a int, b varchar(100));

SELECT S.a, t1.a, t1.b FROM (
WITH
 sub1 AS (SELECT a, b FROM t1 WHERE b = 'c')
 SELECT sub1.a, sub1.b FROM sub1
) S
JOIN t1 ON S.a = t1.a;
{code}
{code}
java.lang.AssertionError
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getUnescapedUnqualifiedTableName(BaseSemanticAnalyzer.java:463)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genJoinLogicalPlan(CalcitePlanner.java:2870)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5047)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1787)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1734)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1495)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:471)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12550)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:361)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:286)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:197)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:810)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:756)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:750)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:249)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:193)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:415)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:346)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:709)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:679)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:169)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 

Re: Review Request 72113: DML execution on TEZ always outputs the message 'No rows affected'

2020-02-14 Thread Panos Garefalakis via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72113/#review219591
---


Ship it!




+1 from me as well
As discussed the extra 0s in the query out are caused by the 
counters.findCounter() that initializes an empty counter when reading a non 
existent counter -- then all HIVE_COUNTERS are printed by 
PostExecTezSummaryPrinter

- Panos Garefalakis


On Feb. 13, 2020, 3:40 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72113/
> ---
> 
> (Updated Feb. 13, 2020, 3:40 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Mustafa Iman, Panos Garefalakis, and 
> Ramesh Kumar Thangarajan.
> 
> 
> Bugs: HIVE-22870
> https://issues.apache.org/jira/browse/HIVE-22870
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Executing an update or insert statement in beeline doesn't show the actual 
> rows inserted/updated.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 25dd970a9b1 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out 9c5695ae603 
>   ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out 
> f9b5f8f0d4d 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 9ad0a9b7faf 
>   ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out 
> 3e99e0ee627 
>   ql/src/test/results/clientpositive/llap/retry_failure_reorder.q.out 
> baeac434d79 
>   ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
> 885cb0a9cba 
> 
> 
> Diff: https://reviews.apache.org/r/72113/diff/2/
> 
> 
> Testing
> ---
> 
> with insert and updates
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



[jira] [Created] (HIVE-22891) Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode

2020-02-14 Thread Syed Shameerur Rahman (Jira)
Syed Shameerur Rahman created HIVE-22891:


 Summary: Skip PartitonDesc Extraction In CombineHiveRecord For 
Non-LLAP Execution Mode
 Key: HIVE-22891
 URL: https://issues.apache.org/jira/browse/HIVE-22891
 Project: Hive
  Issue Type: Task
Reporter: Syed Shameerur Rahman
Assignee: Syed Shameerur Rahman
 Fix For: 4.0.0


{code:java}
try {
  // TODO: refactor this out
  if (pathToPartInfo == null) {
MapWork mrwork;
if (HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) {
  mrwork = (MapWork) Utilities.getMergeWork(jobConf);
  if (mrwork == null) {
mrwork = Utilities.getMapWork(jobConf);
  }
} else {
  mrwork = Utilities.getMapWork(jobConf);
}
pathToPartInfo = mrwork.getPathToPartitionInfo();
  }  PartitionDesc part = extractSinglePartSpec(hsplit);
  inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part);
} catch (HiveException e) {
  throw new IOException(e);
}
{code}
The above piece of code in CombineHiveRecordReader.java was introduced in 
HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is not 
required in non-LLAP mode of execution as the method 
HiveInputFormat.wrapForLlap() simply returns the previously defined inputFormat 
in case of non-LLAP mode. The method call extractSinglePartSpec() has some 
serious performance implications. If there are large no. of small files, each 
call in the method extractSinglePartSpec() takes approx ~ (2 - 3) seconds. 
Hence the same query which runs in Hive 1.x / Hive 2 is way faster than the 
query run on latest hive.
{code:java}
2020-02-11 07:15:04,701 INFO [main] 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from  
2020-02-11 

07:15:06,468 WARN [main] org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: 
Multiple partitions found; not going to pass a part spec to LLAP IO: 
{{logdate=2020-02-03, hour=01, event=win}} and {{logdate=2020-02-03, hour=02, 
event=act}}

2020-02-11 07:15:06,468 INFO [main] 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting 
org.apache.hadoop.mapred.FileSplit{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22890) Repl load tries to load function if table name contains _function

2020-02-14 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-22890:
--

 Summary: Repl load tries to load function if table name contains 
_function
 Key: HIVE-22890
 URL: https://issues.apache.org/jira/browse/HIVE-22890
 Project: Hive
  Issue Type: Bug
Reporter: Aasha Medhi
Assignee: Aasha Medhi


Repl load tries to load function if table name contains _function. Similarly 
for the below contants
{code:java}
public static final String FUNCTIONS_ROOT_DIR_NAME = "_functions";
public static final String CONSTRAINTS_ROOT_DIR_NAME = "_constraints";
public static final String INC_BOOTSTRAP_ROOT_DIR_NAME = "_bootstrap";
public static final String REPL_TABLE_LIST_DIR_NAME = "_tables";
{code}
The code just checks for contains(FUNCTIONS_ROOT_DIR_NAME). So even if any 
table or db name contains _functions, it takes the Function Load flow. 
Similarly for contraints etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-14 Thread Marta Kuczora via Review Board


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
> > Lines 1444 (patched)
> > 
> >
> > Why is this null?

It is null, because if the union all optimization is on, the different union 
statements will be translated into different FileSinkOperators and they will 
write to their own separate directories. They are normally writing to the 
staging directory and under folders with specific 'HIVE_UNION_SUBDIR_' prefix. 
Then the move tasks will move these files to the final table directory. In ACID 
tables these FileSinkOperators would write to different delta directories 
anyway, so the tasks could write directly to the final table location instead 
of the 'HIVE_UNION_SUBDIR_' folders. That's why the unionSuffix is null here. 
In other cases, they have the 'HIVE_UNION_SUBDIR_' value.
Btw, I locally modified many union q tests to run with ACID tables and ran them 
with MR and Tez. I found one bug, which I fixed and I also added some union q 
tests to run with ACID table and direct insert.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java
> > Lines 77 (patched)
> > 
> >
> > We created this variable - we should use it? Maybe set it even as a 
> > constant?

You're right. I move this as a constant and changed the tests.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/#review219487
---


On Jan. 31, 2020, 4:12 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71904/
> ---
> 
> (Updated Jan. 31, 2020, 4:12 p.m.)
> 
> 
> Review request for hive, Gopal V and Peter Vary.
> 
> 
> Bugs: HIVE-21164
> https://issues.apache.org/jira/browse/HIVE-21164
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Extended the original patch with saving the task attempt ids in the file 
> names and also fixed some bugs in the original patch.
> With this fix, inserting into an ACID table would not use move task to place 
> the generated files into the final directory. It will inserts every files to 
> the final directory and then clean up the files which are not needed (like 
> written by failed task attempts).
> Also fixed the replication tests which failed for the original patch as well.
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  da677c7977 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 056cd27496 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  31d15fdef9 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  c2aa73b5f1 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  4c0137 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
> 9a3258115b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebee82 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6c67bc7dd8 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960102 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb223f2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 2f5ec5270c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 8980a6292a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984abd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java 
> c4c56f8477 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> b8a0f0465c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 398698ec06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  2543dc6fc4 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7f061d4a6b 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 73ca658d9c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 5fcc367cc9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> c102a69f8f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java ecc7bdee4d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 
>   

Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

2020-02-14 Thread Denys Kuzmenko via Review Board


> On Feb. 13, 2020, 3:08 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 4546 (patched)
> > 
> >
> > Hi Rajesh, could you please explain, what is the reason of doing 
> > partition filtering on HMS side, not backend db?
> 
> Rajesh Balamohan wrote:
> By adding all the partition details, the query can become large and has 
> the issue of overflowing in the case of oracle (i,e have to batch with 1000 
> entries). Also, its incurs parsing in sql server side, as it is executed as 
> Statement. Given that we have additional filter now, it would make it a lot 
> simpler to do this in client side.  This was pointed out in the JIRA by Gopal.

Can we rewrite the query with JOIN operator? Somethinkg like:
https://issues.apache.org/jira/browse/HIVE-22888


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
---


On Feb. 14, 2020, 1:27 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> ---
> 
> (Updated Feb. 14, 2020, 1:27 a.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> - Main change is in TxnHandler::checkLock. 
> - When all incoming requests are SHARED_READ, we can add a condition in 
> the query to retrieve only relevant rows. This avoids significant number of 
> rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner 
> condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in 
> the jumpttable. This condition can be optimised later.
> - Also, removed the "HL_PARTITION IN" clause which could potentially 
> overflow for oracle. Partition details can be filtered out, if the earlier 
> query actually returned any rows.
> - Rest of the changes, are related to refactoring 
> "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java a8b9653411 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/3/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22850.5.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



[jira] [Created] (HIVE-22889) Trim trailing and leading quotes for HCatCli query processing

2020-02-14 Thread Ramesh Kumar Thangarajan (Jira)
Ramesh Kumar Thangarajan created HIVE-22889:
---

 Summary: Trim trailing and leading quotes for HCatCli query 
processing
 Key: HIVE-22889
 URL: https://issues.apache.org/jira/browse/HIVE-22889
 Project: Hive
  Issue Type: Bug
Reporter: Ramesh Kumar Thangarajan
Assignee: Ramesh Kumar Thangarajan


Trim trailing and leading quotes for HCatCli query processing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)