[jira] [Assigned] (HIVE-24933) Replication fails for transactional tables having same name as dropped non-transactional table

2021-04-13 Thread Pratyushotpal Madhukar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyushotpal Madhukar reassigned HIVE-24933:
-

Assignee: Pratyushotpal Madhukar

> Replication fails for transactional tables having same name as dropped 
> non-transactional table
> --
>
> Key: HIVE-24933
> URL: https://issues.apache.org/jira/browse/HIVE-24933
> Project: Hive
>  Issue Type: Bug
>Reporter: Pratyushotpal Madhukar
>Assignee: Pratyushotpal Madhukar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24978) Optimise number of DROP_PARTITION events created.

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24978?focusedWorklogId=582202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582202
 ]

ASF GitHub Bot logged work on HIVE-24978:
-

Author: ASF GitHub Bot
Created on: 14/Apr/21 05:11
Start Date: 14/Apr/21 05:11
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #2154:
URL: https://github.com/apache/hive/pull/2154#discussion_r612939857



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -4294,6 +4295,9 @@ public void testMoveOptimizationIncremental() throws 
IOException {
 
   @Test
   public void testDatabaseInJobName() throws Throwable {
+// Clean up configurations
+driver.getConf().set(JobContext.JOB_NAME, "");

Review comment:
   can this clean up be done at tear down?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582202)
Time Spent: 50m  (was: 40m)

> Optimise number of DROP_PARTITION events created.
> -
>
> Key: HIVE-24978
> URL: https://issues.apache.org/jira/browse/HIVE-24978
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Even for drop partition with batches, presently there is one event for every 
> partition, optimise to merge them, to save the number of calls to HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN

2021-04-13 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320702#comment-17320702
 ] 

Peter Vary commented on HIVE-25011:
---

CC: [~dkuzmenko]

> Concurrency: Do not acquire locks for EXPLAIN
> -
>
> Key: HIVE-25011
> URL: https://issues.apache.org/jira/browse/HIVE-25011
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking, Transactions
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Gopal Vijayaraghavan
>Priority: Major
> Attachments: HIVE-25011.1.patch, HIVE-25011.2.patch
>
>
> {code}
> EXPLAIN UPDATE ...
> {code}
> should not be in conflict with another active ongoing UPDATE operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24969) Predicates are removed by PPD when left semi join followed by lateral view

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24969?focusedWorklogId=582187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582187
 ]

ASF GitHub Bot logged work on HIVE-24969:
-

Author: ASF GitHub Bot
Created on: 14/Apr/21 04:19
Start Date: 14/Apr/21 04:19
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2145:
URL: https://github.com/apache/hive/pull/2145#issuecomment-819216494


   @kasakrisz @maheshk114 @jcamachor any thoughts here?
   Thanks,
   Zhihua Deng


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582187)
Time Spent: 1h  (was: 50m)

> Predicates are removed by PPD when left semi join followed by lateral view
> --
>
> Key: HIVE-24969
> URL: https://issues.apache.org/jira/browse/HIVE-24969
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Step to reproduce:
> {code:java}
> select count(distinct logItem.triggerId)
> from service_stat_log LATERAL VIEW explode(logItems) LogItemTable AS logItem
> where logItem.dsp in ('delivery', 'ocpa')
> and logItem.iswin = true
> and logItem.adid in (
>  select distinct adId
>  from ad_info
>  where subAccountId in (16010, 14863));  {code}
> For predicates _logItem.dsp in ('delivery', 'ocpa')_  and _logItem.iswin = 
> true_ are removed when doing ppd: JOIN ->   RS  -> LVJ.  The JOIN has 
> candicates: logitem -> [logItem.dsp in ('delivery', 'ocpa'), logItem.iswin = 
> true],when pushing them to the RS followed by LVJ,  none of them are pushed, 
> the candicates of logitem are removed finally by default, which cause to the 
> wrong result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25012) Parsing table alias is failing if query has table properties specified

2021-04-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25012 started by Krisztian Kasa.
-
> Parsing table alias is failing if query has table properties specified
> --
>
> Key: HIVE-25012
> URL: https://issues.apache.org/jira/browse/HIVE-25012
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from 
> t1('acid.fetch.deleted.rows'='true')
> join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a;
> {code}
> When creating Join RelNode the aliases are used to lookup left and right 
> input RelNodes. Aliases are extracted from the AST subtree of the left and 
> right inputs of the join AST node. In case of a table reference:
> {code}
> TOK_TABREF
>TOK_TABNAME
>   t1
>TOK_TABLEPROPERTIES
>   TOK_TABLEPROPLIST
>  TOK_TABLEPROPERTY
> 'acid.fetch.deleted.rows'
> 'true'
> {code} 
> Prior HIVE-24854 queries mentioned above failed because existing solution was 
> not expect TOK_TABLEPROPERTIES.
> The goal of this patch is to parse TOK_TABREF properly using existing 
> solution also used in SemanticAnalyser.doPhase1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25012) Parsing table alias is failing if query has table properties specified

2021-04-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25012:
--
Fix Version/s: 4.0.0

> Parsing table alias is failing if query has table properties specified
> --
>
> Key: HIVE-25012
> URL: https://issues.apache.org/jira/browse/HIVE-25012
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from 
> t1('acid.fetch.deleted.rows'='true')
> join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a;
> {code}
> When creating Join RelNode the aliases are used to lookup left and right 
> input RelNodes. Aliases are extracted from the AST subtree of the left and 
> right inputs of the join AST node. In case of a table reference:
> {code}
> TOK_TABREF
>TOK_TABNAME
>   t1
>TOK_TABLEPROPERTIES
>   TOK_TABLEPROPLIST
>  TOK_TABLEPROPERTY
> 'acid.fetch.deleted.rows'
> 'true'
> {code} 
> Prior HIVE-24854 queries mentioned above failed because existing solution was 
> not expect TOK_TABLEPROPERTIES.
> The goal of this patch is to parse TOK_TABREF properly using existing 
> solution also used in SemanticAnalyser.doPhase1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25012) Parsing table alias is failing if query has table properties specified

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25012:
--
Labels: pull-request-available  (was: )

> Parsing table alias is failing if query has table properties specified
> --
>
> Key: HIVE-25012
> URL: https://issues.apache.org/jira/browse/HIVE-25012
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from 
> t1('acid.fetch.deleted.rows'='true')
> join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a;
> {code}
> When creating Join RelNode the aliases are used to lookup left and right 
> input RelNodes. Aliases are extracted from the AST subtree of the left and 
> right inputs of the join AST node. In case of a table reference:
> {code}
> TOK_TABREF
>TOK_TABNAME
>   t1
>TOK_TABLEPROPERTIES
>   TOK_TABLEPROPLIST
>  TOK_TABLEPROPERTY
> 'acid.fetch.deleted.rows'
> 'true'
> {code} 
> Prior HIVE-24854 queries mentioned above failed because existing solution was 
> not expect TOK_TABLEPROPERTIES.
> The goal of this patch is to parse TOK_TABREF properly using existing 
> solution also used in SemanticAnalyser.doPhase1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25012) Parsing table alias is failing if query has table properties specified

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25012?focusedWorklogId=582166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582166
 ]

ASF GitHub Bot logged work on HIVE-25012:
-

Author: ASF GitHub Bot
Created on: 14/Apr/21 03:00
Start Date: 14/Apr/21 03:00
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2177:
URL: https://github.com/apache/hive/pull/2177


   ### What changes were proposed in this pull request?
   Move `getSimpleTableNameBase` and `findTabRefIdxs` to allow using them from 
`BaseSemanticAnalyzer.getTableAlias `
   
   ### Why are the changes needed?
   See jira.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=fetch_deleted_rows.q -pl itests/qtest -Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582166)
Remaining Estimate: 0h
Time Spent: 10m

> Parsing table alias is failing if query has table properties specified
> --
>
> Key: HIVE-25012
> URL: https://issues.apache.org/jira/browse/HIVE-25012
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from 
> t1('acid.fetch.deleted.rows'='true')
> join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a;
> {code}
> When creating Join RelNode the aliases are used to lookup left and right 
> input RelNodes. Aliases are extracted from the AST subtree of the left and 
> right inputs of the join AST node. In case of a table reference:
> {code}
> TOK_TABREF
>TOK_TABNAME
>   t1
>TOK_TABLEPROPERTIES
>   TOK_TABLEPROPLIST
>  TOK_TABLEPROPERTY
> 'acid.fetch.deleted.rows'
> 'true'
> {code} 
> Prior HIVE-24854 queries mentioned above failed because existing solution was 
> not expect TOK_TABLEPROPERTIES.
> The goal of this patch is to parse TOK_TABREF properly using existing 
> solution also used in SemanticAnalyser.doPhase1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25012) Parsing table alias is failing if query has table properties specified

2021-04-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25012:
-


> Parsing table alias is failing if query has table properties specified
> --
>
> Key: HIVE-25012
> URL: https://issues.apache.org/jira/browse/HIVE-25012
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Parser
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from 
> t1('acid.fetch.deleted.rows'='true')
> join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a;
> {code}
> When creating Join RelNode the aliases are used to lookup left and right 
> input RelNodes. Aliases are extracted from the AST subtree of the left and 
> right inputs of the join AST node. In case of a table reference:
> {code}
> TOK_TABREF
>TOK_TABNAME
>   t1
>TOK_TABLEPROPERTIES
>   TOK_TABLEPROPLIST
>  TOK_TABLEPROPERTY
> 'acid.fetch.deleted.rows'
> 'true'
> {code} 
> Prior HIVE-24854 queries mentioned above failed because existing solution was 
> not expect TOK_TABLEPROPERTIES.
> The goal of this patch is to parse TOK_TABREF properly using existing 
> solution also used in SemanticAnalyser.doPhase1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations

2021-04-13 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24854:
---
Fix Version/s: 4.0.0

> Incremental Materialized view refresh in presence of update/delete operations
> -
>
> Key: HIVE-24854
> URL: https://issues.apache.org/jira/browse/HIVE-24854
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Current implementation of incremental Materialized can not be used if any of 
> the Materialized view source tables has update or delete operation since the 
> last rebuild. In such cases a full rebuild should be performed.
> Steps to enable incremental rebuild:
> 1. Introduce a new virtual column to mark a row deleted
> 2. Execute the query in the view definition 
> 2.a. Add filter to each table scan in order to pull only the rows from each 
> source table which has a higher writeId than the writeId of the last rebuild 
> - this is already implemented by current incremental rebuild
> 2.b Add row is deleted virtual column to each table scan. In join nodes if 
> any of the branches has a deleted row the result row is also deleted.
> We should distinguish two type of view definition queries: with and without 
> Aggregate.
> 3.a No aggregate path:
> Rewrite the plan of the full rebuild to a multi insert statement with two 
> insert branches. One branch to insert new rows into the materialized view 
> table and the second one for insert deleted rows to the materialized view 
> delete delta.
> 3.b Aggregate path: TBD
> Prerequisite:
> source tables haven't compacted since the last MV revuild



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24993) AssertionError when referencing ROW__ID.writeId

2021-04-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-24993.
---
Resolution: Fixed

> AssertionError when referencing ROW__ID.writeId
> ---
>
> Key: HIVE-24993
> URL: https://issues.apache.org/jira/browse/HIVE-24993
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.support.concurrency=true;
> create table t1(a int, b float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a, b) values (1, 1.1);
> insert into t1(a, b) values (2, 2.2);
> SELECT t1.ROW__ID
> FROM t1
> WHERE t1.ROW__ID.writeid > 1;
> {code}
> {code}
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hive.ql.parse.UnparseTranslator.addTranslation(UnparseTranslator.java:123)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5680)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5570)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5530)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3385)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3706)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3717)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5281)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1839)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1785)
>   at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1546)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:563)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12582)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Resolved] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations

2021-04-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-24854.
---
Resolution: Fixed

Pushed to master. Thanks [~jcamachorodriguez] for review.

> Incremental Materialized view refresh in presence of update/delete operations
> -
>
> Key: HIVE-24854
> URL: https://issues.apache.org/jira/browse/HIVE-24854
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Current implementation of incremental Materialized can not be used if any of 
> the Materialized view source tables has update or delete operation since the 
> last rebuild. In such cases a full rebuild should be performed.
> Steps to enable incremental rebuild:
> 1. Introduce a new virtual column to mark a row deleted
> 2. Execute the query in the view definition 
> 2.a. Add filter to each table scan in order to pull only the rows from each 
> source table which has a higher writeId than the writeId of the last rebuild 
> - this is already implemented by current incremental rebuild
> 2.b Add row is deleted virtual column to each table scan. In join nodes if 
> any of the branches has a deleted row the result row is also deleted.
> We should distinguish two type of view definition queries: with and without 
> Aggregate.
> 3.a No aggregate path:
> Rewrite the plan of the full rebuild to a multi insert statement with two 
> insert branches. One branch to insert new rows into the materialized view 
> table and the second one for insert deleted rows to the materialized view 
> delete delta.
> 3.b Aggregate path: TBD
> Prerequisite:
> source tables haven't compacted since the last MV revuild



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24854?focusedWorklogId=582153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582153
 ]

ASF GitHub Bot logged work on HIVE-24854:
-

Author: ASF GitHub Bot
Created on: 14/Apr/21 02:12
Start Date: 14/Apr/21 02:12
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #2119:
URL: https://github.com/apache/hive/pull/2119


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582153)
Time Spent: 2h 20m  (was: 2h 10m)

> Incremental Materialized view refresh in presence of update/delete operations
> -
>
> Key: HIVE-24854
> URL: https://issues.apache.org/jira/browse/HIVE-24854
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Current implementation of incremental Materialized can not be used if any of 
> the Materialized view source tables has update or delete operation since the 
> last rebuild. In such cases a full rebuild should be performed.
> Steps to enable incremental rebuild:
> 1. Introduce a new virtual column to mark a row deleted
> 2. Execute the query in the view definition 
> 2.a. Add filter to each table scan in order to pull only the rows from each 
> source table which has a higher writeId than the writeId of the last rebuild 
> - this is already implemented by current incremental rebuild
> 2.b Add row is deleted virtual column to each table scan. In join nodes if 
> any of the branches has a deleted row the result row is also deleted.
> We should distinguish two type of view definition queries: with and without 
> Aggregate.
> 3.a No aggregate path:
> Rewrite the plan of the full rebuild to a multi insert statement with two 
> insert branches. One branch to insert new rows into the materialized view 
> table and the second one for insert deleted rows to the materialized view 
> delete delta.
> 3.b Aggregate path: TBD
> Prerequisite:
> source tables haven't compacted since the last MV revuild



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24591) Move Beeline To SLF4J Simple Logger

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24591?focusedWorklogId=582119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582119
 ]

ASF GitHub Bot logged work on HIVE-24591:
-

Author: ASF GitHub Bot
Created on: 14/Apr/21 00:18
Start Date: 14/Apr/21 00:18
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1833:
URL: https://github.com/apache/hive/pull/1833


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582119)
Time Spent: 2h 40m  (was: 2.5h)

> Move Beeline To SLF4J Simple Logger
> ---
>
> Key: HIVE-24591
> URL: https://issues.apache.org/jira/browse/HIVE-24591
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> To make beeline as simple as possible, move its SLF4J logger implementation 
> to SLFJ-Simple logger.  This will allow users to change the logging level 
> simply on the command line.  Currently uses must create a Log4J configuration 
> file which is way too advance/cumbersome for a data analyst that just wants 
> to use SQL (and do some minor troubleshooting)
> {code:none}
> export HADOOP_CLIENT_OPTS="-Dorg.slf4j.simpleLogger.defaultLogLevel=debug"
> beeline ...
> {code}
> http://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24717) Migrate to listStatusIterator in moving files

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24717?focusedWorklogId=582118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582118
 ]

ASF GitHub Bot logged work on HIVE-24717:
-

Author: ASF GitHub Bot
Created on: 14/Apr/21 00:18
Start Date: 14/Apr/21 00:18
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1934:
URL: https://github.com/apache/hive/pull/1934


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582118)
Time Spent: 1h  (was: 50m)

> Migrate to listStatusIterator in moving files
> -
>
> Key: HIVE-24717
> URL: https://issues.apache.org/jira/browse/HIVE-24717
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa İman
>Assignee: Mustafa İman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive.java has various calls to hdfs listStatus call when moving 
> files/directories around. These codepaths are used for insert overwrite 
> table/partition queries.
> listStatus It is blocking call whereas listStatusIterator is backed by a 
> RemoteIterator and fetches pages in the background. Hive should take 
> advantage of that since Hadoop has implemented listStatusIterator for S3 
> recently https://issues.apache.org/jira/browse/HADOOP-17074



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25004?focusedWorklogId=582074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582074
 ]

ASF GitHub Bot logged work on HIVE-25004:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 21:41
Start Date: 13/Apr/21 21:41
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #2170:
URL: https://github.com/apache/hive/pull/2170#issuecomment-819072492


    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582074)
Time Spent: 0.5h  (was: 20m)

> HPL/SQL subsequent statements are failing after typing a malformed input in 
> beeline
> ---
>
> Key: HIVE-25004
> URL: https://issues.apache.org/jira/browse/HIVE-25004
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> An error signal is stuck after evaluating the first expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24997?focusedWorklogId=582073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582073
 ]

ASF GitHub Bot logged work on HIVE-24997:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 21:40
Start Date: 13/Apr/21 21:40
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #2166:
URL: https://github.com/apache/hive/pull/2166#issuecomment-819072149


    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582073)
Time Spent: 0.5h  (was: 20m)

> HPL/SQL udf doesn't work in tez container mode
> --
>
> Key: HIVE-24997
> URL: https://issues.apache.org/jira/browse/HIVE-24997
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since  HIVE-24230  it assumes the UDF is evaluated on HS2 which is not true 
> in general. The SessionState is only available at compile time evaluation but 
> later on a new interpreter should be instantiated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=582072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582072
 ]

ASF GitHub Bot logged work on HIVE-24914:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 21:30
Start Date: 13/Apr/21 21:30
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #2108:
URL: https://github.com/apache/hive/pull/2108#issuecomment-819067487


    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582072)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582068
 ]

ASF GitHub Bot logged work on HIVE-24472:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 21:19
Start Date: 13/Apr/21 21:19
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on pull request #2123:
URL: https://github.com/apache/hive/pull/2123#issuecomment-819061922


    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582068)
Time Spent: 1.5h  (was: 1h 20m)

> Optimize LlapTaskSchedulerService::preemptTasksFromMap
> --
>
> Key: HIVE-24472
> URL: https://issues.apache.org/jira/browse/HIVE-24472
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571!
> speculativeTasks could possibly include node information to reduce CPU burn 
> in LlapTaskSchedulerService::preemptTasksFromMap
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582067
 ]

ASF GitHub Bot logged work on HIVE-24472:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 21:17
Start Date: 13/Apr/21 21:17
Worklog Time Spent: 10m 
  Work Description: mustafaiman commented on a change in pull request #2123:
URL: https://github.com/apache/hive/pull/2123#discussion_r612782613



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
##
@@ -429,6 +437,11 @@ public LlapTaskSchedulerService(TaskSchedulerContext 
taskSchedulerContext, Clock
 delayedTaskSchedulerExecutor =
 MoreExecutors.listeningDecorator(delayedTaskSchedulerExecutorRaw);
 
+ExecutorService preemptTaskSchedulerExecutorRaw = 
Executors.newFixedThreadPool(1,

Review comment:
   I checked that too and got confused. LlapTaskScheduler does the work of 
finding preemption candidates etc. even though preemption cannot occur in the 
end. Also, LlapTaskScheduler marks tasks as preempted and updates preemption 
stats eventhough nothing is preempted because of 
LLAP_DAEMON_TASK_SCHEDULER_ENABLE_PREEMPTION is false. Am I understanding this 
correctly?
   
   This is not the problem of this patch obviously. I am just asking to 
understand. I'll +1 this regardless.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582067)
Time Spent: 1h 20m  (was: 1h 10m)

> Optimize LlapTaskSchedulerService::preemptTasksFromMap
> --
>
> Key: HIVE-24472
> URL: https://issues.apache.org/jira/browse/HIVE-24472
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571!
> speculativeTasks could possibly include node information to reduce CPU burn 
> in LlapTaskSchedulerService::preemptTasksFromMap
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN

2021-04-13 Thread Gopal Vijayaraghavan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal Vijayaraghavan updated HIVE-25011:

Attachment: HIVE-25011.2.patch

> Concurrency: Do not acquire locks for EXPLAIN
> -
>
> Key: HIVE-25011
> URL: https://issues.apache.org/jira/browse/HIVE-25011
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking, Transactions
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Gopal Vijayaraghavan
>Priority: Major
> Attachments: HIVE-25011.1.patch, HIVE-25011.2.patch
>
>
> {code}
> EXPLAIN UPDATE ...
> {code}
> should not be in conflict with another active ongoing UPDATE operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582060
 ]

ASF GitHub Bot logged work on HIVE-24472:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 20:49
Start Date: 13/Apr/21 20:49
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2123:
URL: https://github.com/apache/hive/pull/2123#discussion_r612766954



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
##
@@ -429,6 +437,11 @@ public LlapTaskSchedulerService(TaskSchedulerContext 
taskSchedulerContext, Clock
 delayedTaskSchedulerExecutor =
 MoreExecutors.listeningDecorator(delayedTaskSchedulerExecutorRaw);
 
+ExecutorService preemptTaskSchedulerExecutorRaw = 
Executors.newFixedThreadPool(1,

Review comment:
   Well, I agree but the actual LLAP preemption Conf we are using 
https://github.com/apache/hive/blob/61d5c641b2e414c7b7dfd92f2b402db3583507c8/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L5023
   
   is actually targeting the TaskExecutorService within the LlapDaemon 
(waitQueue tasks vs Running) and not the LlapTaskSchedulingService -- in a 
sense this a different type of preemption and I am not sure we should just use 
the same conf here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582060)
Time Spent: 1h 10m  (was: 1h)

> Optimize LlapTaskSchedulerService::preemptTasksFromMap
> --
>
> Key: HIVE-24472
> URL: https://issues.apache.org/jira/browse/HIVE-24472
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571!
> speculativeTasks could possibly include node information to reduce CPU burn 
> in LlapTaskSchedulerService::preemptTasksFromMap
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN

2021-04-13 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320533#comment-17320533
 ] 

Gopal Vijayaraghavan commented on HIVE-25011:
-

{code}
  /**
   * Find whether we should execute the current query due to explain
   * @return true if the query needs to be executed, false if not
   */
  public boolean isExplainSkipExecution() {
return (explainConfig != null && explainConfig.getAnalyze() != 
AnalyzeState.RUNNING);
  }
{code}

Looks like the comment is actually wrong on the "return true"

> Concurrency: Do not acquire locks for EXPLAIN
> -
>
> Key: HIVE-25011
> URL: https://issues.apache.org/jira/browse/HIVE-25011
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking, Transactions
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Gopal Vijayaraghavan
>Priority: Major
> Attachments: HIVE-25011.1.patch
>
>
> {code}
> EXPLAIN UPDATE ...
> {code}
> should not be in conflict with another active ongoing UPDATE operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN

2021-04-13 Thread Gopal Vijayaraghavan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal Vijayaraghavan reassigned HIVE-25011:
---

Assignee: Gopal Vijayaraghavan

> Concurrency: Do not acquire locks for EXPLAIN
> -
>
> Key: HIVE-25011
> URL: https://issues.apache.org/jira/browse/HIVE-25011
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking, Transactions
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Gopal Vijayaraghavan
>Priority: Major
> Attachments: HIVE-25011.1.patch
>
>
> {code}
> EXPLAIN UPDATE ...
> {code}
> should not be in conflict with another active ongoing UPDATE operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN

2021-04-13 Thread Gopal Vijayaraghavan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal Vijayaraghavan updated HIVE-25011:

Status: Patch Available  (was: Open)

> Concurrency: Do not acquire locks for EXPLAIN
> -
>
> Key: HIVE-25011
> URL: https://issues.apache.org/jira/browse/HIVE-25011
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking, Transactions
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Priority: Major
> Attachments: HIVE-25011.1.patch
>
>
> {code}
> EXPLAIN UPDATE ...
> {code}
> should not be in conflict with another active ongoing UPDATE operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN

2021-04-13 Thread Gopal Vijayaraghavan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal Vijayaraghavan updated HIVE-25011:

Attachment: HIVE-25011.1.patch

> Concurrency: Do not acquire locks for EXPLAIN
> -
>
> Key: HIVE-25011
> URL: https://issues.apache.org/jira/browse/HIVE-25011
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking, Transactions
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Priority: Major
> Attachments: HIVE-25011.1.patch
>
>
> {code}
> EXPLAIN UPDATE ...
> {code}
> should not be in conflict with another active ongoing UPDATE operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582055
 ]

ASF GitHub Bot logged work on HIVE-24472:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 20:34
Start Date: 13/Apr/21 20:34
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2123:
URL: https://github.com/apache/hive/pull/2123#discussion_r612758022



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
##
@@ -2324,7 +2278,114 @@ private void maybeAddToDelayedTaskQueue(TaskInfo 
taskInfo) {
 }
   }
 
+  private void maybeAddToHighPriorityTaskQueue(TaskInfo taskInfo) {
+// Only add task if its not already in the Queue AND there no mores than 
HOSTS tasks there already
+// as we are performing up to HOSTS preemptions at a time
+if (!taskInfo.isInHighPriorityQueue() && highPriorityTaskQueue.size() < 
activeInstances.size()) {
+  taskInfo.setInHighPriorityQueue(true);
+  highPriorityTaskQueue.add(taskInfo);
+}
+  }
+
   // -- Inner classes defined after this point --
+  class PreemptionSchedulerCallable implements Callable {
+private final AtomicBoolean isShutdown = new AtomicBoolean(false);
+
+@Override
+public Void call() {
+  while (!isShutdown.get() && !Thread.currentThread().isInterrupted()) {
+try {
+  TaskInfo taskInfo = getNextTask();
+  // Tasks can exist in the queue even after they have been scheduled.
+  // Process task Preemption only if the task is still in PENDING 
state.
+  processTaskPreemption(taskInfo);
+
+} catch (InterruptedException e) {
+  if (isShutdown.get()) {
+LOG.info("PreemptTaskScheduler thread interrupted after shutdown");
+break;
+  } else {
+LOG.warn("PreemptTaskScheduler thread interrupted before being 
shutdown");
+throw new RuntimeException("PreemptTaskScheduler thread 
interrupted without being shutdown", e);
+  }
+}
+  }
+  return null;
+}
+
+private void processTaskPreemption(TaskInfo taskInfo) {
+  if (shouldAttemptTask(taskInfo) && tryTaskPreemption(taskInfo)) {
+trySchedulingPendingTasks();
+  }
+  // Enables scheduler to reAdd task in Queue if needed
+  taskInfo.setInHighPriorityQueue(false);
+}
+
+private boolean tryTaskPreemption(TaskInfo taskInfo) {
+  // Find a lower priority task that can be preempted on a particular host.
+  // ONLY if there's no pending preemptions on that host to avoid 
preempting twice for a task.
+  Set potentialHosts = null; // null => preempt on any host.
+  readLock.lock();
+  try {
+// Protect against a bad location being requested.
+if (taskInfo.requestedHosts != null && taskInfo.requestedHosts.length 
!= 0) {
+  potentialHosts = Sets.newHashSet(taskInfo.requestedHosts);
+}
+if (potentialHosts != null) {
+  // Preempt on specific host
+  boolean shouldPreempt = true;
+  for (String host : potentialHosts) {
+// Preempt only if there are no pending preemptions on the same 
host
+// When the preemption registers, the request at the highest 
priority will be given the slot,
+// even if the initial preemption was caused by some other task.
+// TODO Maybe register which task the preemption was for, to avoid 
a bad non-local allocation.
+MutableInt pendingHostPreemptions = 
pendingPreemptionsPerHost.get(host);
+if (pendingHostPreemptions != null && 
pendingHostPreemptions.intValue() > 0) {
+  shouldPreempt = false;
+  LOG.debug("No preempt candidate for task={}. Found an existing 
preemption request on host={}, pendingPreemptionCount={}",
+  taskInfo.task, host, pendingHostPreemptions.intValue());
+  break;
+}
+  }
+
+  if (!shouldPreempt) {
+LOG.debug("No preempt candidate for {} on potential hosts={}. An 
existing preemption request exists",
+taskInfo.task, potentialHosts);
+return false;
+  }
+} else {
+  // Unknown requested host -- Request for a preemption if there's 
none pending. If a single preemption is pending,
+  // and this is the next task to be assigned, it will be assigned 
once that slot becomes available.
+  if (pendingPreemptions.get() != 0) {
+LOG.debug("Skipping preempt candidate since there are {} pending 
preemption request. For task={}",
+pendingPreemptions.get(), taskInfo);
+return false;
+  }
+}
+
+LOG.debug("Attempting preempt candidate for task={}, priority={} on 

[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582054
 ]

ASF GitHub Bot logged work on HIVE-24472:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 20:34
Start Date: 13/Apr/21 20:34
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2123:
URL: https://github.com/apache/hive/pull/2123#discussion_r612757788



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
##
@@ -3049,7 +3131,7 @@ boolean isUpdateInProgress() {
   return isPendingUpdate;
 }
 
-TezTaskAttemptID getAttemptId() {
+synchronized TezTaskAttemptID getAttemptId() {

Review comment:
   Ack removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582054)
Time Spent: 50m  (was: 40m)

> Optimize LlapTaskSchedulerService::preemptTasksFromMap
> --
>
> Key: HIVE-24472
> URL: https://issues.apache.org/jira/browse/HIVE-24472
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571!
> speculativeTasks could possibly include node information to reduce CPU burn 
> in LlapTaskSchedulerService::preemptTasksFromMap
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582052
 ]

ASF GitHub Bot logged work on HIVE-24472:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 20:33
Start Date: 13/Apr/21 20:33
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2123:
URL: https://github.com/apache/hive/pull/2123#discussion_r612757340



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
##
@@ -1954,6 +1911,37 @@ protected void schedulePendingTasks() throws 
InterruptedException {
   break;
 }
   }
+  // Finally take care of preemption requests that can unblock higher-pri 
tasks.
+  // This removes preemptable tasks from the runningList and sends out a 
preempt request to the system.
+  // Subsequent tasks will be scheduled once the de-allocate request for 
the preempted task is processed.
+  while (!preemptionCandidates.isEmpty()) {
+TaskInfo toPreempt = preemptionCandidates.take();
+// 1. task has not terminated
+if (toPreempt.isGuaranteed != null) {
+  String host = toPreempt.getAssignedNode().getHost();
+   // 2. is currently assigned 3. no preemption pending on that Host
+  if (toPreempt.getState() == TaskInfo.State.ASSIGNED &&
+  (pendingPreemptionsPerHost.get(host) == null || 
pendingPreemptionsPerHost.get(host).intValue() == 0)) {
+LOG.debug("Preempting task took {} ms {}", (clock.getTime() - 
toPreempt.getPreemptedTime()), toPreempt);

Review comment:
   Left it mostly to see how fast Preemption messages are propagated but 
its not super useful agree -- removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582052)
Time Spent: 40m  (was: 0.5h)

> Optimize LlapTaskSchedulerService::preemptTasksFromMap
> --
>
> Key: HIVE-24472
> URL: https://issues.apache.org/jira/browse/HIVE-24472
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571!
> speculativeTasks could possibly include node information to reduce CPU burn 
> in LlapTaskSchedulerService::preemptTasksFromMap
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=582047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582047
 ]

ASF GitHub Bot logged work on HIVE-24914:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 20:29
Start Date: 13/Apr/21 20:29
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2108:
URL: https://github.com/apache/hive/pull/2108#discussion_r612754974



##
File path: 
llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskSchedulerService.java
##
@@ -1115,7 +1116,7 @@ public void 
testHostPreferenceMissesConsistentPartialAlive() throws IOException,
   // 3rd task requested host3, got host1 since host3 is dead and host4 is 
full
   assertEquals(HOST1, 
argumentCaptor2.getAllValues().get(2).getNodeId().getHost());
 
-  verify(tsWrapper.mockServiceInstanceSet, 
times(2)).getAllInstancesOrdered(true);
+  verify(tsWrapper.mockServiceInstanceSet, 
atLeast(2)).getAllInstancesOrdered(true);

Review comment:
   Good catch, the main reason this was converted to atLeast is that 
before: getAllInstancesOrdered was only called when a Task had to rollover to 
the next node when that was disabled as in the test above.
   However now, on every scheduler call we get the Alive hosts in-order (using 
getAllInstancesOrdered when using consistent hashing) -- the problem here is we 
dont know how many time the scheduling loop will be called when the test 
finishes -- thus changed to at least (2) needed for the requests to make 
progress.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582047)
Time Spent: 1h 10m  (was: 1h)

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=582041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582041
 ]

ASF GitHub Bot logged work on HIVE-24914:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 20:15
Start Date: 13/Apr/21 20:15
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2108:
URL: https://github.com/apache/hive/pull/2108#discussion_r612746804



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
##
@@ -1536,20 +1492,24 @@ private SelectHostResult selectHost(TaskInfo request) {
   }
 
   // requested host is still alive but cannot accept task, pick the next 
available host in consistent order
-  for (int i = 0; i < allNodes.size(); i++) {
-NodeInfo nodeInfo = allNodes.get((i + requestedHostIdx + 1) % 
allNodes.size());
-// next node in consistent order died or does not have free slots, 
rollover to next
-if (nodeInfo == null || !nodeInfo.canAcceptTask()) {
-  continue;
-} else {
-  if (LOG.isDebugEnabled()) {
-LOG.debug("Assigning {} in consistent order when looking for first 
requested host, from #hosts={},"
-+ " requestedHosts={}", nodeInfo.toShortString(), 
allNodes.size(),
-  ((requestedHosts == null || requestedHosts.length == 0) ? "null" 
:
-requestedHostsDebugStr));
+  if (!activeNodesWithFreeSlots.isEmpty()) {
+NodeInfo nextSlot = null;
+boolean found = false;
+for (Entry> entry : 
availableHostMap.entrySet()) {
+  if (found && !entry.getValue().isEmpty()) {
+nextSlot = entry.getValue().iterator().next();
+break;
   }
-  return new SelectHostResult(nodeInfo);
+  if (entry.getKey().equals(firstRequestedHost)) found = true;
+}
+// rollover
+if (nextSlot == null) nextSlot = 
activeNodesWithFreeSlots.stream().findFirst().get();
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Assigning {} in consistent order when looking for first 
requested host, from #hosts={},"

Review comment:
   Sure fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582041)
Time Spent: 1h  (was: 50m)

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=582039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582039
 ]

ASF GitHub Bot logged work on HIVE-24914:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 20:12
Start Date: 13/Apr/21 20:12
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2108:
URL: https://github.com/apache/hive/pull/2108#discussion_r612744867



##
File path: 
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
##
@@ -1816,23 +1776,90 @@ private static boolean 
removeFromRunningTaskMap(TreeMap>> 
getResourceAvailability() {
+int memory = 0;
+int vcores = 0;
+int numInstancesFound = 0;
+Map> availableHostMap;
+readLock.lock();
+try {
+  // maintain insertion order (needed for Next slot in locality miss)
+  availableHostMap = new LinkedHashMap<>(instanceToNodeMap.size());
+  Collection instances = consistentSplits ?
+  // might also include Inactive instances
+  activeInstances.getAllInstancesOrdered(true):
+  // if consistent splits are NOT used we don't need the ordering as 
there will be no cache benefit anyways
+  activeInstances.getAll();
+  boolean foundSlot = false;
+  for (LlapServiceInstance inst : instances) {
+NodeInfo nodeInfo = instanceToNodeMap.get(inst.getWorkerIdentity());
+if (nodeInfo != null) {
+  List hostList = availableHostMap.get(nodeInfo.getHost());
+  if (hostList == null) {
+hostList = new ArrayList<>();
+availableHostMap.put(nodeInfo.getHost(), hostList);
+  }
+  if (!(inst instanceof InactiveServiceInstance)) {
+Resource r = inst.getResource();
+memory += r.getMemory();
+vcores += r.getVirtualCores();
+numInstancesFound++;
+// Only add to List Nodes with available resources
+// Hosts, however, exist even for nodes that do not currently have 
resources
+if (nodeInfo.canAcceptTask()) {
+  foundSlot = true;
+  hostList.add(nodeInfo);
+}
+  }
+}
+  }
+  // isClusterCapacityFull will be set to false on every 
trySchedulingPendingTasks call
+  // set it false here to bail out early when we know there are no 
resources available.
+  if (!foundSlot) {
+isClusterCapacityFull.set(true);
+  }
+} finally {
+  readLock.unlock();
+}
+if (LOG.isDebugEnabled()) {
+  LOG.debug("ResourceAvail: numInstancesFound={}, totalMem={}, 
totalVcores={} availableHosts: {}",

Review comment:
   sure fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582039)
Time Spent: 50m  (was: 40m)

> Improve LLAP scheduling by only traversing hosts with capacity
> --
>
> Key: HIVE-24914
> URL: https://issues.apache.org/jira/browse/HIVE-24914
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *schedulePendingTasks* on the LlapTaskScheduler currently goes through all 
> the pending tasks and tries to allocate them based on their Priority -- if a 
> priority can not be scheduled completely, we bail out as lower priorities 
> would not be able to get allocations either.
> An optimization here could be to only walk through the nodes with capacity 
> (if any) ,and not all available hosts, for scheduling these tasks based on 
> their priority and locality preferences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24947) Casting exception when reading vectorized parquet file for insert into

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24947?focusedWorklogId=582037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582037
 ]

ASF GitHub Bot logged work on HIVE-24947:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 20:09
Start Date: 13/Apr/21 20:09
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #2176:
URL: https://github.com/apache/hive/pull/2176


   ### What changes were proposed in this pull request?
   Make sure Parquet values are decoded on the fly are each Page can decide to 
encode values or not.
   As a result we might end up with a VRB where half the values are encoded and 
the rest are not!
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   ### How was this patch tested?
   TODO -- add q.test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 582037)
Remaining Estimate: 0h
Time Spent: 10m

> Casting exception when reading vectorized parquet file for insert into
> --
>
> Key: HIVE-24947
> URL: https://issues.apache.org/jira/browse/HIVE-24947
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have two parquet tables (target and source).
> Upon running the query:
> {code:java}
> set hive.vectorized.execution.enabled=true;
> insert into target2 partition(part_col_1, part_col_2) select * from 
> source;{code}
> The following exception is thrown:
> {code:java}
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> [B
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.fillColumnVector(VectorizedListColumnReader.java:308)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:342)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:91)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:433)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:376)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:99)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>   ... 24 more
> {code}
> The same runs without problems when vectorization is turned off. 
> cc [~nareshpr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24947) Casting exception when reading vectorized parquet file for insert into

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24947:
--
Labels: pull-request-available  (was: )

> Casting exception when reading vectorized parquet file for insert into
> --
>
> Key: HIVE-24947
> URL: https://issues.apache.org/jira/browse/HIVE-24947
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have two parquet tables (target and source).
> Upon running the query:
> {code:java}
> set hive.vectorized.execution.enabled=true;
> insert into target2 partition(part_col_1, part_col_2) select * from 
> source;{code}
> The following exception is thrown:
> {code:java}
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> [B
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.fillColumnVector(VectorizedListColumnReader.java:308)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:342)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:91)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:433)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:376)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:99)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>   ... 24 more
> {code}
> The same runs without problems when vectorization is turned off. 
> cc [~nareshpr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24947) Casting exception when reading vectorized parquet file for insert into

2021-04-13 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24947:
-

Assignee: Panagiotis Garefalakis

> Casting exception when reading vectorized parquet file for insert into
> --
>
> Key: HIVE-24947
> URL: https://issues.apache.org/jira/browse/HIVE-24947
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marton Bod
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have two parquet tables (target and source).
> Upon running the query:
> {code:java}
> set hive.vectorized.execution.enabled=true;
> insert into target2 partition(part_col_1, part_col_2) select * from 
> source;{code}
> The following exception is thrown:
> {code:java}
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> [B
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.fillColumnVector(VectorizedListColumnReader.java:308)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:342)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:91)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:433)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:376)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:99)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>   ... 24 more
> {code}
> The same runs without problems when vectorization is turned off. 
> cc [~nareshpr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN

2021-04-13 Thread Gopal Vijayaraghavan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal Vijayaraghavan updated HIVE-25011:

Description: 
{code}
EXPLAIN UPDATE ...
{code}

should not be in conflict with another active ongoing UPDATE operation.

  was:
{code}
EXPLAIN UPDATE ...
{code}

should be in conflict with another active ongoing UPDATE operation.


> Concurrency: Do not acquire locks for EXPLAIN
> -
>
> Key: HIVE-25011
> URL: https://issues.apache.org/jira/browse/HIVE-25011
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking, Transactions
>Affects Versions: 4.0.0
>Reporter: Gopal Vijayaraghavan
>Priority: Major
>
> {code}
> EXPLAIN UPDATE ...
> {code}
> should not be in conflict with another active ongoing UPDATE operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25010) Create qtest-iceberg module

2021-04-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25010:



> Create qtest-iceberg module
> ---
>
> Key: HIVE-25010
> URL: https://issues.apache.org/jira/browse/HIVE-25010
> Project: Hive
>  Issue Type: Test
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> We should create a qtest-iceberg module under itests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25009?focusedWorklogId=581988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581988
 ]

ASF GitHub Bot logged work on HIVE-25009:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 19:12
Start Date: 13/Apr/21 19:12
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #2175:
URL: https://github.com/apache/hive/pull/2175


   …PE if the COMPACTION_QUEUE is empty
   
   
   
   ### What changes were proposed in this pull request?
   
   Null check
   
   ### Why are the changes needed?
   
   Version check can cause NPE if the queue is empty.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Manual tests were conducted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581988)
Remaining Estimate: 0h
Time Spent: 10m

> Compaction worker and initiator version check can cause NPE if the 
> COMPACTION_QUEUE is empty
> 
>
> Key: HIVE-25009
> URL: https://issues.apache.org/jira/browse/HIVE-25009
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25009:
--
Labels: pull-request-available  (was: )

> Compaction worker and initiator version check can cause NPE if the 
> COMPACTION_QUEUE is empty
> 
>
> Key: HIVE-25009
> URL: https://issues.apache.org/jira/browse/HIVE-25009
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty

2021-04-13 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-25009:
---
Affects Version/s: 4.0.0

> Compaction worker and initiator version check can cause NPE if the 
> COMPACTION_QUEUE is empty
> 
>
> Key: HIVE-25009
> URL: https://issues.apache.org/jira/browse/HIVE-25009
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty

2021-04-13 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25009 started by Antal Sinkovits.
--
> Compaction worker and initiator version check can cause NPE if the 
> COMPACTION_QUEUE is empty
> 
>
> Key: HIVE-25009
> URL: https://issues.apache.org/jira/browse/HIVE-25009
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty

2021-04-13 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-25009:
--


> Compaction worker and initiator version check can cause NPE if the 
> COMPACTION_QUEUE is empty
> 
>
> Key: HIVE-25009
> URL: https://issues.apache.org/jira/browse/HIVE-25009
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty

2021-04-13 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-25009:
---
Component/s: Transactions

> Compaction worker and initiator version check can cause NPE if the 
> COMPACTION_QUEUE is empty
> 
>
> Key: HIVE-25009
> URL: https://issues.apache.org/jira/browse/HIVE-25009
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581961
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 18:41
Start Date: 13/Apr/21 18:41
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612691469



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service
+${project.version}
+
+
+org.apache.hive
+hive-exec
+
+
+
+
+org.apache.hive
+hive-standalone-metastore-server
+tests
+${project.version}
+
+
+org.apache.hive
+hive-iceberg-catalog
+tests
+${project.version}
+
+
+
+org.apache.avro
+avro

Review comment:
   I think avro is not just a test dependency - can move this above the 
comment?




-- 
This is an automated message from the Apache Git Service.
To respond to 

[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581960
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 18:40
Start Date: 13/Apr/21 18:40
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612690771



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server

Review comment:
   I think we only used the `metastore-common` dependency - do we need to 
declare the server as well? it's listed below with the `tests` classifier 
already




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581960)
Time Spent: 3h  (was: 2h 50m)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} 
> and subsequent iceberg modules under this module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581959
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 18:39
Start Date: 13/Apr/21 18:39
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612690167



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service

Review comment:
   hive-service and avro are not just test dependencies 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581959)
Time Spent: 2h 50m  (was: 2h 40m)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>

[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581958
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 18:38
Start Date: 13/Apr/21 18:38
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612689560



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service
+${project.version}
+
+
+org.apache.hive
+hive-exec
+
+
+
+
+org.apache.hive
+hive-standalone-metastore-server
+tests
+${project.version}
+
+
+org.apache.hive
+hive-iceberg-catalog
+tests
+${project.version}
+
+
+
+org.apache.avro
+avro
+${iceberg.avro.version}
+
+
+org.apache.orc
+orc-core

Review comment:
   I think we're getting this via 

[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581949
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 18:30
Start Date: 13/Apr/21 18:30
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612684745



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service
+${project.version}
+
+
+org.apache.hive
+hive-exec
+
+
+
+
+org.apache.hive
+hive-standalone-metastore-server
+tests
+${project.version}
+
+
+org.apache.hive
+hive-iceberg-catalog
+tests
+${project.version}
+
+
+
+org.apache.avro
+avro
+${iceberg.avro.version}
+
+
+org.apache.orc
+orc-core
+${orc.version}
+
+   

[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581933
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 17:56
Start Date: 13/Apr/21 17:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612662787



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service
+${project.version}
+
+
+org.apache.hive
+hive-exec
+
+
+
+
+org.apache.hive
+hive-standalone-metastore-server
+tests
+${project.version}
+
+
+org.apache.hive
+hive-iceberg-catalog
+tests
+${project.version}
+
+
+
+org.apache.avro
+avro
+${iceberg.avro.version}
+
+
+org.apache.orc
+orc-core
+${orc.version}
+
+
+  

[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581858
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 15:38
Start Date: 13/Apr/21 15:38
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612561610



##
File path: pom.xml
##
@@ -64,8 +64,7 @@
 standalone-metastore
 upgrade-acid
 kafka-handler
-iceberg-handler
-iceberg-catalog
+iceberg

Review comment:
   Great, thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581858)
Time Spent: 2h 10m  (was: 2h)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} 
> and subsequent iceberg modules under this module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581856
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 15:32
Start Date: 13/Apr/21 15:32
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612556987



##
File path: pom.xml
##
@@ -64,8 +64,7 @@
 standalone-metastore
 upgrade-acid
 kafka-handler
-iceberg-handler
-iceberg-catalog
+iceberg

Review comment:
   Applied my other patches so there is a new class in 
`patched-iceberg-api` (`CommitStateUnknownException`)
   Run the packaging and checked the `packaging/target/.../lib` and it still 
contains the libs:
   ```
   $ cd 
packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/
   $ ls *iceberg*
   hive-iceberg-catalog-4.0.0-SNAPSHOT.jar  
hive-iceberg-handler-4.0.0-SNAPSHOT.jar
   ```
   
   The new class is in the jar:
   ```
   $ zip -sf hive-iceberg-handler-4.0.0-SNAPSHOT.jar |grep 
CommitStateUnknownException
 
org/apache/hive/iceberg/org/apache/iceberg/exceptions/CommitStateUnknownException.class
   ```
   
   So this seems ok
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581856)
Time Spent: 2h  (was: 1h 50m)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} 
> and subsequent iceberg modules under this module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=581853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581853
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 15:27
Start Date: 13/Apr/21 15:27
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r612552854



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java
##
@@ -422,7 +422,7 @@ private String extractTableFullName(StatsTask tsk) throws 
SemanticException {
 TableSpec tableSpec = new TableSpec(table, partitions);
 tableScan.getConf().getTableMetadata().setTableSpec(tableSpec);
 
-if (BasicStatsNoJobTask.canUseFooterScan(table, inputFormat)) {
+if (BasicStatsNoJobTask.canUseColumnStats(table, inputFormat)) {

Review comment:
   Right, fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581853)
Time Spent: 6h 10m  (was: 6h)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=581849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581849
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 15:26
Start Date: 13/Apr/21 15:26
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r612551277



##
File path: 
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##
@@ -153,6 +156,37 @@ public DecomposedPredicate decomposePredicate(JobConf 
jobConf, Deserializer dese
 return predicate;
   }
 
+  @Override
+  public boolean canProvideBasicStatistics() {
+return true;
+  }
+
+  @Override
+  public Map getBasicStatistics(TableDesc tableDesc) {
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+Map stats = new HashMap<>();
+if (table.currentSnapshot() != null) {
+  Map summary = table.currentSnapshot().summary();
+  if (summary != null) {
+if (summary.containsKey(SnapshotSummary.TOTAL_DATA_FILES_PROP)) {
+  stats.put(StatsSetupConst.NUM_FILES, 
summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
+}
+if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
+  stats.put(StatsSetupConst.ROW_COUNT, 
summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+}
+// TODO: add TOTAL_SIZE when iceberg 0.12 is released
+if (summary.containsKey("total-files-size")) {
+  stats.put(StatsSetupConst.TOTAL_SIZE, 
summary.get("total-files-size"));
+}
+  }
+} else {
+  stats.put(StatsSetupConst.NUM_FILES, "0");

Review comment:
   In the case of an empty table, the current snapshot is null. I thought 
setting all the basic stats to 0 is the right approach since we don't have any 
data. 
   When the summary of the snapshot is not available I return an empty 
statistics map. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581849)
Time Spent: 6h  (was: 5h 50m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=581843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581843
 ]

ASF GitHub Bot logged work on HIVE-24928:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 15:19
Start Date: 13/Apr/21 15:19
Worklog Time Spent: 10m 
  Work Description: lcspinter commented on a change in pull request #2111:
URL: https://github.com/apache/hive/pull/2111#discussion_r612545094



##
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java
##
@@ -119,16 +129,83 @@ public String getName() {
 return "STATS-NO-JOB";
   }
 
-  static class StatItem {
-Partish partish;
-Map params;
-Object result;
+  abstract static class StatCollector implements Runnable {
+
+protected Partish partish;
+protected Object result;
+protected LogHelper console;
+
+public static Function SIMPLE_NAME_FUNCTION =
+sc -> String.format("%s#%s", 
sc.partish().getTable().getCompleteName(), sc.partish().getPartishType());
+
+public static Function EXTRACT_RESULT_FUNCTION = 
sc -> (Partition) sc.result();
+
+abstract Partish partish();
+abstract boolean isValid();
+abstract Object result();
+abstract void init(HiveConf conf, LogHelper console) throws IOException;
+
+protected String toString(Map parameters) {
+  return StatsSetupConst.SUPPORTED_STATS.stream().map(st -> st + "=" + 
parameters.get(st))
+  .collect(Collectors.joining(", "));
+}
   }
 
-  static class FooterStatCollector implements Runnable {
+  static class HiveStorageHandlerStatCollector extends StatCollector {
+
+public HiveStorageHandlerStatCollector(Partish partish) {
+  this.partish = partish;
+}
+
+@Override
+public void init(HiveConf conf, LogHelper console) throws IOException {
+  this.console = console;
+}
+
+@Override
+public void run() {
+  try {
+Table table = partish.getTable();
+Map parameters = partish.getPartParameters();
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Map basicStatistics = 
table.getStorageHandler().getBasicStatistics(tableDesc);

Review comment:
   Correct, I missed that. I will provide the `partish` object which is 
enough to calculate the table/partition stats on StorageHandler side. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581843)
Time Spent: 5h 50m  (was: 5h 40m)

> In case of non-native tables use basic statistics from HiveStorageHandler
> -
>
> Key: HIVE-24928
> URL: https://issues.apache.org/jira/browse/HIVE-24928
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE 
> ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by 
> the BasicStatsTask class. This class tries to estimate the statistics by 
> scanning the directory of the table. 
> In the case of non-native tables (iceberg, hbase), the table directory might 
> contain metadata files as well, which would be counted by the BasicStatsTask 
> when calculating basic stats. 
> Instead of having this logic, the HiveStorageHandler implementation should 
> provide basic statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581835
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 15:05
Start Date: 13/Apr/21 15:05
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612533431



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service
+${project.version}
+
+
+org.apache.hive
+hive-exec
+
+
+
+
+org.apache.hive
+hive-standalone-metastore-server
+tests
+${project.version}
+
+
+org.apache.hive
+hive-iceberg-catalog
+tests
+${project.version}
+
+
+
+org.apache.avro
+avro
+${iceberg.avro.version}
+
+
+org.apache.orc
+orc-core
+${orc.version}
+
+   

[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581830
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 15:00
Start Date: 13/Apr/21 15:00
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612529170



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service
+${project.version}
+
+
+org.apache.hive
+hive-exec
+
+
+
+
+org.apache.hive
+hive-standalone-metastore-server
+tests
+${project.version}
+
+
+org.apache.hive
+hive-iceberg-catalog
+tests
+${project.version}
+
+
+
+org.apache.avro
+avro
+${iceberg.avro.version}
+
+
+org.apache.orc
+orc-core
+${orc.version}
+
+   

[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=581828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581828
 ]

ASF GitHub Bot logged work on HIVE-24974:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:54
Start Date: 13/Apr/21 14:54
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2148:
URL: https://github.com/apache/hive/pull/2148#discussion_r612517973



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,22 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+
HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", 
"7200s",
+new TimeValidator(TimeUnit.SECONDS),
+"Maximum lifetime in seconds for an entry in the ACID metrics cache."),
+
HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval",
 "30s",
+new TimeValidator(TimeUnit.SECONDS),
+"Reporting period for ACID metrics in seconds."),
+
HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold",
 100,
+"Threshold for the number of delta files to include in the ACID 
metrics report."),
+
HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.obsolete.delta.num.threshold",
 100,

Review comment:
   Similar to above

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,22 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+
HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", 
"7200s",
+new TimeValidator(TimeUnit.SECONDS),
+"Maximum lifetime in seconds for an entry in the ACID metrics cache."),
+
HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval",
 "30s",
+new TimeValidator(TimeUnit.SECONDS),
+"Reporting period for ACID metrics in seconds."),
+
HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold",
 100,

Review comment:
   Isn't this: the minimum number of active delta files a table/partition 
must have to be included in the report

##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -43,17 +44,18 @@
 import org.apache.hadoop.hive.metastore.metrics.Metrics;
 import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
 import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import 
org.apache.hadoop.hive.ql.txn.compactor.metrics.DeltaFilesMetricReporter;
+import org.apache.tez.common.counters.TezCounters;
 import org.junit.Assert;
 import org.junit.Before;
 import org.junit.Test;
 
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Objects;
+import java.util.*;

Review comment:
   I think it's not recommended to import all classes in the package

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,22 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 

Review comment:
   Here I would add a comment like:
   Configs having to do with DeltaFilesMetricReporter, which collects lists of 
most recently active tables with the most number of active/obsolete deltas.

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3002,6 +3002,22 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false,
   "Enables read-only transaction classification and related 
optimizations"),
 
+
HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 
100,
+"Size of the ACID metrics cache. Only topN metrics would remain in the 
cache if exceeded."),
+

[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581827
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:53
Start Date: 13/Apr/21 14:53
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612522412



##
File path: pom.xml
##
@@ -64,8 +64,7 @@
 standalone-metastore
 upgrade-acid
 kafka-handler
-iceberg-handler
-iceberg-catalog
+iceberg

Review comment:
   That's a good question!
   I will check...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581827)
Time Spent: 1.5h  (was: 1h 20m)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} 
> and subsequent iceberg modules under this module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581826
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:52
Start Date: 13/Apr/21 14:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612521207



##
File path: pom.xml
##
@@ -64,8 +64,7 @@
 standalone-metastore
 upgrade-acid
 kafka-handler
-iceberg-handler
-iceberg-catalog
+iceberg

Review comment:
   Just to check, after this refactor, the handler and catalog jars are 
still packaged into the `packaging/target/.../lib` directory? How about the 
patched jars (core and api)? Are they correctly included into the shaded 
handler jar with the patched classfiles, but not included into the packaging 
`/lib`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581826)
Time Spent: 1h 20m  (was: 1h 10m)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} 
> and subsequent iceberg modules under this module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581825
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:52
Start Date: 13/Apr/21 14:52
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612521207



##
File path: pom.xml
##
@@ -64,8 +64,7 @@
 standalone-metastore
 upgrade-acid
 kafka-handler
-iceberg-handler
-iceberg-catalog
+iceberg

Review comment:
   Just to check, after this refactor, the handler and catalog jars are 
still packaged into the `packaging/target/.../lib` directory? How about the 
patched jars (core and api)? Are they correctly included into the shaded 
handler jar with the patched classfiles, but the patches jars are not included 
into the packaging `/lib`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581825)
Time Spent: 1h 10m  (was: 1h)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} 
> and subsequent iceberg modules under this module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581822
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:50
Start Date: 13/Apr/21 14:50
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612519866



##
File path: iceberg/patched-iceberg-core/pom.xml
##
@@ -0,0 +1,80 @@
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive-iceberg
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+org.apache.iceberg
+iceberg-core
+patched-${iceberg-api.version}-${parent.version}
+Patched Iceberg Core
+
+
+
+
+
+
+
+
+
+
+
+../..
+..
+
+
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-common
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+
+
+
+org.apache.maven.plugins
+maven-dependency-plugin
+
+
+unpack
+generate-sources
+
+unpack
+
+
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+jar
+true
+
${project.build.directory}/classes
+

Review comment:
   Those will come with the individual changes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581822)
Time Spent: 1h  (was: 50m)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} 
> and subsequent iceberg modules under this module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581821
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:49
Start Date: 13/Apr/21 14:49
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612518701



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service
+${project.version}
+
+
+org.apache.hive
+hive-exec
+
+
+
+
+org.apache.hive
+hive-standalone-metastore-server
+tests
+${project.version}
+
+
+org.apache.hive
+hive-iceberg-catalog
+tests
+${project.version}
+
+
+
+org.apache.avro
+avro
+${iceberg.avro.version}
+
+
+org.apache.orc
+orc-core
+${orc.version}
+
+
+  

[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581820
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:48
Start Date: 13/Apr/21 14:48
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612517993



##
File path: iceberg/patched-iceberg-core/pom.xml
##
@@ -0,0 +1,80 @@
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive-iceberg
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+org.apache.iceberg
+iceberg-core
+patched-${iceberg-api.version}-${parent.version}
+Patched Iceberg Core
+
+
+
+
+
+
+
+
+
+
+
+../..
+..
+
+
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-common
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+
+
+
+org.apache.maven.plugins
+maven-dependency-plugin
+
+
+unpack
+generate-sources
+
+unpack
+
+
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+jar
+true
+
${project.build.directory}/classes
+

Review comment:
   I think I'm missing some step here: don't we need to list the 
SnapshotSummary class to be replaced here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581820)
Time Spent: 40m  (was: 0.5h)

> Move iceberg-handler under a hive-iceberg module
> 
>
> Key: HIVE-25003
> URL: https://issues.apache.org/jira/browse/HIVE-25003
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} 
> and subsequent iceberg modules under this module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581816
 ]

ASF GitHub Bot logged work on HIVE-25003:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:44
Start Date: 13/Apr/21 14:44
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2169:
URL: https://github.com/apache/hive/pull/2169#discussion_r612513676



##
File path: iceberg/pom.xml
##
@@ -0,0 +1,325 @@
+
+
+http://www.w3.org/2001/XMLSchema-instance;
+ xmlns="http://maven.apache.org/POM/4.0.0;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+org.apache.hive
+hive
+4.0.0-SNAPSHOT
+../pom.xml
+
+4.0.0
+
+hive-iceberg
+4.0.0-SNAPSHOT
+pom
+Hive Iceberg Modules
+
+
+..
+.
+0.11.0
+4.0.2
+1.9.2
+4.0.2
+
3.1.2
+2.5.0
+
+
+
+
+
+iceberg-catalog
+iceberg-handler
+
+
+
+
+
+org.apache.iceberg
+iceberg-api
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-core
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-hive-metastore
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-data
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-parquet
+${iceberg-api.version}
+
+
+org.apache.iceberg
+iceberg-orc
+${iceberg-api.version}
+
+
+
+org.apache.hive
+hive-iceberg-catalog
+${project.version}
+
+
+
+org.apache.hive
+hive-exec
+${project.version}
+
+
+com.google.code.findbugs
+jsr305
+
+
+com.google.guava
+*
+
+
+com.google.protobuf
+protobuf-java
+
+
+org.apache.avro
+avro
+
+
+org.apache.calcite.avatica
+*
+
+
+org.apache.hive
+hive-llap-tez
+
+
+org.apache.logging.log4j
+*
+
+
+org.pentaho
+*
+
+
+org.slf4j
+slf4j-log4j12
+
+
+
+
+org.apache.hive
+hive-serde
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-server
+${project.version}
+
+
+org.apache.hive
+hive-standalone-metastore-common
+${project.version}
+
+
+
+org.apache.hadoop
+hadoop-client
+${hadoop.version}
+
+
+org.apache.avro
+avro
+
+
+
+
+
+
+org.apache.hive
+hive-service
+${project.version}
+
+
+org.apache.hive
+hive-exec
+
+
+
+
+org.apache.hive
+hive-standalone-metastore-server
+tests
+${project.version}
+
+
+org.apache.hive
+hive-iceberg-catalog
+tests
+${project.version}
+
+
+
+org.apache.avro
+avro
+${iceberg.avro.version}
+
+
+org.apache.orc
+orc-core
+${orc.version}
+
+   

[jira] [Work logged] (HIVE-24978) Optimise number of DROP_PARTITION events created.

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24978?focusedWorklogId=581801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581801
 ]

ASF GitHub Bot logged work on HIVE-24978:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 14:15
Start Date: 13/Apr/21 14:15
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #2154:
URL: https://github.com/apache/hive/pull/2154#discussion_r611252750



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/partition/drop/AlterTableDropPartitionOperation.java
##
@@ -120,6 +126,12 @@ private void dropPartitions() throws HiveException {
 List droppedPartitions = 
context.getDb().dropPartitions(tablenName.getDb(), tablenName.getTable(),
 partitionExpressions, options);
 
+if (isRepl) {

Review comment:
   It does a lot of stuff below related to ``llap`` and printing to 
console, which I thought would be irrelevant here, so I added this check, So in 
case of replication, I skip the below part and return from here itself
   There is something like this below:
   
   ``
 // We have already locked the table, don't lock the partitions.
 DDLUtils.addIfAbsentByName(new WriteEntity(partition, 
WriteEntity.WriteType.DDL_NO_LOCK), context);
   ``
   
   This ddlUtils stuff only, I am not very sure, Let me know if you find it 
relevant, I will add it in my block as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581801)
Time Spent: 40m  (was: 0.5h)

> Optimise number of DROP_PARTITION events created.
> -
>
> Key: HIVE-24978
> URL: https://issues.apache.org/jira/browse/HIVE-24978
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Even for drop partition with batches, presently there is one event for every 
> partition, optimise to merge them, to save the number of calls to HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25006:
--
Labels: pull-request-available  (was: )

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=581779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581779
 ]

ASF GitHub Bot logged work on HIVE-25006:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 13:39
Start Date: 13/Apr/21 13:39
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2161:
URL: https://github.com/apache/hive/pull/2161#discussion_r612455851



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
##
@@ -250,9 +255,32 @@ public int execute() {
   this.setException(new HiveException(monitor.getDiagnostics()));
 }
 
-// fetch the counters
 try {
   Set statusGetOpts = 
EnumSet.of(StatusGetOpts.GET_COUNTERS);
+  // save useful commit information into session conf, e.g. for custom 
commit hooks
+  List allWork = work.getAllWork();
+  boolean hasReducer = 
allWork.stream().map(workToVertex::get).anyMatch(v -> 
v.getName().startsWith("Reducer"));
+  for (BaseWork baseWork : allWork) {
+Vertex vertex = workToVertex.get(baseWork);
+if (!hasReducer || vertex.getName().startsWith("Reducer")) {
+  // construct the parsable job id
+  VertexStatus status = 
dagClient.getVertexStatus(vertex.getName(), statusGetOpts);
+  String[] jobIdParts = status.getId().split("_");
+  // status.getId() returns something like: 
vertex_1617722404520_0001_1_00
+  // this should be transformed to a parsable JobID: 
job_16177224045200_0001
+  int vertexId = Integer.parseInt(jobIdParts[jobIdParts.length - 
1]);
+  String jobId = String.format("job_%s%d_%s", jobIdParts[1], 
vertexId, jobIdParts[2]);
+  // prefix with table name (for multi-table inserts), if available
+  String tableName = 
Optional.ofNullable(workToConf.get(baseWork)).map(c -> 
c.get("name")).orElse(null);
+  String jobIdKey = HIVE_TEZ_COMMIT_JOB_ID + (tableName == null ? 
"" : "." + tableName);;
+  String taskCountKey = HIVE_TEZ_COMMIT_TASK_COUNT + (tableName == 
null ? "" : "." + tableName);
+  // save info into session conf
+  HiveConf sessionConf = SessionState.get().getConf();
+  sessionConf.set(jobIdKey, jobId);
+  sessionConf.setInt(taskCountKey, 
status.getProgress().getSucceededTaskCount());

Review comment:
   I'll look into this in the following PR, once we've replaced the 
temporary listing solution with the permanent one and upgraded the Tez 
dependency.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581779)
Remaining Estimate: 0h
Time Spent: 10m

> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25008) Migrate hive table data into Iceberg format.

2021-04-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25008:



> Migrate hive table data into Iceberg format.
> 
>
> Key: HIVE-25008
> URL: https://issues.apache.org/jira/browse/HIVE-25008
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> We should provide a way to migrate native hive table data files into Iceberg 
> format with just a simple ALTER TABLE ... SET 
> TBLPROPERTIES('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler')
>  command. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24876) Disable /longconf.jsp page on HS2 web UI for non admin users

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24876?focusedWorklogId=581749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581749
 ]

ASF GitHub Bot logged work on HIVE-24876:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 12:57
Start Date: 13/Apr/21 12:57
Worklog Time Spent: 10m 
  Work Description: yongzhi merged pull request #2063:
URL: https://github.com/apache/hive/pull/2063


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581749)
Time Spent: 1h 10m  (was: 1h)

> Disable /longconf.jsp page on HS2 web UI for non admin users
> 
>
> Key: HIVE-24876
> URL: https://issues.apache.org/jira/browse/HIVE-24876
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> /logconf.jsp page should be disabled to the users that are not in admin 
> roles. Otherwise, any user can flood the log files with different log levels 
> that can be configured on HS2 web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24958) Create Iceberg catalog module in Hive

2021-04-13 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-24958.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for the patch [~Marton Bod] and [~lpinter] for the review!

> Create Iceberg catalog module in Hive
> -
>
> Key: HIVE-24958
> URL: https://issues.apache.org/jira/browse/HIVE-24958
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-catalog module in Hive, with the code currently 
> contained in Iceberg's iceberg-hive-metastore module
>  * Make sure all tests pass (including static analysis and checkstyle)
>  * Make iceberg-handler depend on this module instead of 
> iceberg-hive-metastore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24958) Create Iceberg catalog module in Hive

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24958?focusedWorklogId=581731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581731
 ]

ASF GitHub Bot logged work on HIVE-24958:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 12:31
Start Date: 13/Apr/21 12:31
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2138:
URL: https://github.com/apache/hive/pull/2138


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581731)
Time Spent: 50m  (was: 40m)

> Create Iceberg catalog module in Hive
> -
>
> Key: HIVE-24958
> URL: https://issues.apache.org/jira/browse/HIVE-24958
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> * Create a new iceberg-catalog module in Hive, with the code currently 
> contained in Iceberg's iceberg-hive-metastore module
>  * Make sure all tests pass (including static analysis and checkstyle)
>  * Make iceberg-handler depend on this module instead of 
> iceberg-hive-metastore



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581698
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 11:06
Start Date: 13/Apr/21 11:06
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612347286



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent evt)
 
   Map mapOutputInfoMap =
   new HashMap();
-  Channel ch = evt.getChannel();
-
+  Channel ch = ctx.channel();
   // In case of KeepAlive, ensure that timeout handler does not close 
connection until entire
   // response is written (i.e, response headers + mapOutput).
-  ChannelPipeline pipeline = ch.getPipeline();
+  ChannelPipeline pipeline = ch.pipeline();
   TimeoutHandler timeoutHandler = 
(TimeoutHandler)pipeline.get(TIMEOUT_HANDLER);
   timeoutHandler.setEnabledTimeout(false);
 
   String user = userRsrc.get(jobId);
-
+  if (keepAliveParam || connectionKeepAliveEnabled){

Review comment:
   Thanks Laszlo! sounds like a plan! 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581698)
Time Spent: 2.5h  (was: 2h 20m)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25007) Implement insert overwrite for Iceberg tables

2021-04-13 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25007:
-


> Implement insert overwrite for Iceberg tables
> -
>
> Key: HIVE-25007
> URL: https://issues.apache.org/jira/browse/HIVE-25007
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581694
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 11:00
Start Date: 13/Apr/21 11:00
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612343543



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent evt)
 
   Map mapOutputInfoMap =
   new HashMap();
-  Channel ch = evt.getChannel();
-
+  Channel ch = ctx.channel();
   // In case of KeepAlive, ensure that timeout handler does not close 
connection until entire
   // response is written (i.e, response headers + mapOutput).
-  ChannelPipeline pipeline = ch.getPipeline();
+  ChannelPipeline pipeline = ch.pipeline();
   TimeoutHandler timeoutHandler = 
(TimeoutHandler)pipeline.get(TIMEOUT_HANDLER);
   timeoutHandler.setEnabledTimeout(false);
 
   String user = userRsrc.get(jobId);
-
+  if (keepAliveParam || connectionKeepAliveEnabled){

Review comment:
   okay, in this case I'll have to include some unit tests here (which are 
part of tez codebase already) + create a simple repro to share with netty 
community




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581694)
Time Spent: 2h 20m  (was: 2h 10m)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581692
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 11:00
Start Date: 13/Apr/21 11:00
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612343543



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent evt)
 
   Map mapOutputInfoMap =
   new HashMap();
-  Channel ch = evt.getChannel();
-
+  Channel ch = ctx.channel();
   // In case of KeepAlive, ensure that timeout handler does not close 
connection until entire
   // response is written (i.e, response headers + mapOutput).
-  ChannelPipeline pipeline = ch.getPipeline();
+  ChannelPipeline pipeline = ch.pipeline();
   TimeoutHandler timeoutHandler = 
(TimeoutHandler)pipeline.get(TIMEOUT_HANDLER);
   timeoutHandler.setEnabledTimeout(false);
 
   String user = userRsrc.get(jobId);
-
+  if (keepAliveParam || connectionKeepAliveEnabled){

Review comment:
   okay, in this case I'll have to include some unit tests here (which 
might be part of tez code already) + create a simple repro to share with netty 
community




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581692)
Time Spent: 2h 10m  (was: 2h)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection

2021-04-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-24981.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master. Thanks for the review [~pvary]

> Add control file option to HiveStrictManagedMigration for DB/table selection
> 
>
> Key: HIVE-24981
> URL: https://issues.apache.org/jira/browse/HIVE-24981
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently HiveStrictManagedMigration supports db regex and table regex 
> options that allow the user to specify what Hive entities it should deal 
> with. In cases where we have thousands of tables across thousands of DBs 
> iterating through everything takes a lot of time, while specifying a set of 
> tables/DBs with regexes is cumbersome.
> We should make it available for users to prepare control files with the lists 
> of required items to migrate and feed this to the tool. A directory path 
> pointing to these control files would be taken as a new option for HSMM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581689
 ]

ASF GitHub Bot logged work on HIVE-24981:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:51
Start Date: 13/Apr/21 10:51
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #2168:
URL: https://github.com/apache/hive/pull/2168


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581689)
Time Spent: 1h  (was: 50m)

> Add control file option to HiveStrictManagedMigration for DB/table selection
> 
>
> Key: HIVE-24981
> URL: https://issues.apache.org/jira/browse/HIVE-24981
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently HiveStrictManagedMigration supports db regex and table regex 
> options that allow the user to specify what Hive entities it should deal 
> with. In cases where we have thousands of tables across thousands of DBs 
> iterating through everything takes a lot of time, while specifying a set of 
> tables/DBs with regexes is cumbersome.
> We should make it available for users to prepare control files with the lists 
> of required items to migrate and feed this to the tool. A directory path 
> pointing to these control files would be taken as a new option for HSMM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581687
 ]

ASF GitHub Bot logged work on HIVE-24981:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:49
Start Date: 13/Apr/21 10:49
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #2168:
URL: https://github.com/apache/hive/pull/2168#discussion_r612337484



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigrationControlConfig.java
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.util;
+
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+public class HiveStrictManagedMigrationControlConfig {
+
+  private Map> databaseIncludeLists = new TreeMap>();
+
+  public Map> getDatabaseIncludeLists() {
+return databaseIncludeLists;
+  }
+
+  public void setDatabaseIncludeLists(Map> 
databaseIncludeLists) {
+this.databaseIncludeLists = databaseIncludeLists;
+  }
+
+  public void putAllFromConfig(HiveStrictManagedMigrationControlConfig other) {
+for (String db : other.getDatabaseIncludeLists().keySet()) {
+  List theseTables = this.databaseIncludeLists.get(db);
+  List otherTables = other.getDatabaseIncludeLists().get(db);
+  if (theseTables == null) {
+this.databaseIncludeLists.put(db, otherTables);

Review comment:
   I want to merge the lists if they're present. To do this, checking their 
existence is something I need to do anyway, so putIfAbsent cannot make my code 
more compact here unfortunately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581687)
Time Spent: 50m  (was: 40m)

> Add control file option to HiveStrictManagedMigration for DB/table selection
> 
>
> Key: HIVE-24981
> URL: https://issues.apache.org/jira/browse/HIVE-24981
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently HiveStrictManagedMigration supports db regex and table regex 
> options that allow the user to specify what Hive entities it should deal 
> with. In cases where we have thousands of tables across thousands of DBs 
> iterating through everything takes a lot of time, while specifying a set of 
> tables/DBs with regexes is cumbersome.
> We should make it available for users to prepare control files with the lists 
> of required items to migrate and feed this to the tool. A directory path 
> pointing to these control files would be taken as a new option for HSMM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581684
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:44
Start Date: 13/Apr/21 10:44
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612334078



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent evt)
 
   Map mapOutputInfoMap =
   new HashMap();
-  Channel ch = evt.getChannel();
-
+  Channel ch = ctx.channel();
   // In case of KeepAlive, ensure that timeout handler does not close 
connection until entire
   // response is written (i.e, response headers + mapOutput).
-  ChannelPipeline pipeline = ch.getPipeline();
+  ChannelPipeline pipeline = ch.pipeline();
   TimeoutHandler timeoutHandler = 
(TimeoutHandler)pipeline.get(TIMEOUT_HANDLER);
   timeoutHandler.setEnabledTimeout(false);
 
   String user = userRsrc.get(jobId);
-
+  if (keepAliveParam || connectionKeepAliveEnabled){

Review comment:
   Got it, this is helpful but lets make sure this is expected from nettys' 
side of things before committing -- this would be helpful for the Tez change as 
well :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581684)
Time Spent: 2h  (was: 1h 50m)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581683
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:38
Start Date: 13/Apr/21 10:38
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612330581



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent evt)
 
   Map mapOutputInfoMap =
   new HashMap();
-  Channel ch = evt.getChannel();
-
+  Channel ch = ctx.channel();
   // In case of KeepAlive, ensure that timeout handler does not close 
connection until entire
   // response is written (i.e, response headers + mapOutput).
-  ChannelPipeline pipeline = ch.getPipeline();
+  ChannelPipeline pipeline = ch.pipeline();
   TimeoutHandler timeoutHandler = 
(TimeoutHandler)pipeline.get(TIMEOUT_HANDLER);
   timeoutHandler.setEnabledTimeout(false);
 
   String user = userRsrc.get(jobId);
-
+  if (keepAliveParam || connectionKeepAliveEnabled){

Review comment:
   good catch :) this is an epic workaround for a problem that I haven't 
been able to figure out 100%, here are some details:
   
https://issues.apache.org/jira/browse/TEZ-4157?focusedCommentId=17100835=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17100835
   
   (btw: with netty3, we didn't need this)
   
   are you fine with a comment explaining this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581683)
Time Spent: 1h 50m  (was: 1h 40m)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581682
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:38
Start Date: 13/Apr/21 10:38
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612330581



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent evt)
 
   Map mapOutputInfoMap =
   new HashMap();
-  Channel ch = evt.getChannel();
-
+  Channel ch = ctx.channel();
   // In case of KeepAlive, ensure that timeout handler does not close 
connection until entire
   // response is written (i.e, response headers + mapOutput).
-  ChannelPipeline pipeline = ch.getPipeline();
+  ChannelPipeline pipeline = ch.pipeline();
   TimeoutHandler timeoutHandler = 
(TimeoutHandler)pipeline.get(TIMEOUT_HANDLER);
   timeoutHandler.setEnabledTimeout(false);
 
   String user = userRsrc.get(jobId);
-
+  if (keepAliveParam || connectionKeepAliveEnabled){

Review comment:
   good catch :) this is an epic workaround that I haven't been able to 
figure out, here are some details:
   
https://issues.apache.org/jira/browse/TEZ-4157?focusedCommentId=17100835=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17100835
   
   (btw: with netty3, we didn't need this)
   
   are you fine with a comment explaining this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581682)
Time Spent: 1h 40m  (was: 1.5h)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581680
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:36
Start Date: 13/Apr/21 10:36
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612329298



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java
##
@@ -71,15 +72,39 @@ public long transferTo(WritableByteChannel target, long 
position)
   throws IOException {
 if (manageOsCache && readaheadPool != null) {
   readaheadRequest = readaheadPool.readaheadStream(identifier, fd,
-  getPosition() + position, readaheadLength,
-  getPosition() + getCount(), readaheadRequest);
+  position() + position, readaheadLength,
+  position() + count(), readaheadRequest);
 }
-
+long written = 0;

Review comment:
   Got it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581680)
Time Spent: 1.5h  (was: 1h 20m)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581679
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:34
Start Date: 13/Apr/21 10:34
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612328280



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java
##
@@ -71,15 +72,39 @@ public long transferTo(WritableByteChannel target, long 
position)
   throws IOException {
 if (manageOsCache && readaheadPool != null) {
   readaheadRequest = readaheadPool.readaheadStream(identifier, fd,
-  getPosition() + position, readaheadLength,
-  getPosition() + getCount(), readaheadRequest);
+  position() + position, readaheadLength,
+  position() + count(), readaheadRequest);
 }
-
+long written = 0;

Review comment:
   looks better, but I don't think it's correct: in case of an exception 
during the transfer, we should not have set transferred=true




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581679)
Time Spent: 1h 20m  (was: 1h 10m)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581677
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:31
Start Date: 13/Apr/21 10:31
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612326445



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java
##
@@ -124,39 +149,33 @@ long customShuffleTransfer(WritableByteChannel target, 
long position)
 position += trans; 
 trans = 0;
   }
-  
+
   //write data to the target
   while(byteBuffer.hasRemaining()) {
 target.write(byteBuffer);
   }
   
   byteBuffer.clear();
 }
-
+
 return actualCount - trans;
   }
 
-  
-  @Override
-  public void releaseExternalResources() {
-if (readaheadRequest != null) {
-  readaheadRequest.cancel();
-}
-super.releaseExternalResources();
-  }
-  
   /**
* Call when the transfer completes successfully so we can advise the OS that
* we don't need the region to be cached anymore.
*/
   public void transferSuccessful() {
-if (manageOsCache && getCount() > 0) {
+if (manageOsCache && count() > 0) {
   try {
 if (canEvictAfterTransfer) {
-  LOG.debug("shuffleBufferSize: {}, path: {}", shuffleBufferSize, 
identifier);
-  
NativeIO.POSIX.getCacheManipulator().posixFadviseIfPossible(identifier,
-  fd, getPosition(), getCount(),
-  NativeIO.POSIX.POSIX_FADV_DONTNEED);
+  if (fd.valid()) {

Review comment:
   hm, thought this over again, fd.valid() change was needed while I 
haven't been handling deallocate() stuff properly, but now, at this point fd 
should be valid...initially I left this check here because I thought that an 
invalid fd is not a problem (which is true, we won't advise to OS cache, and 
that's it), but as we already have try/catch, we don't need this this check 
(we'll have the exception in the logs anyway)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581677)
Time Spent: 1h 10m  (was: 1h)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581674
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:27
Start Date: 13/Apr/21 10:27
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612323669



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -339,27 +350,60 @@ private ShuffleHandler(Configuration conf) {
 
 
   public void start() throws Exception {
-ServerBootstrap bootstrap = new ServerBootstrap(selector);
-// Timer is shared across entire factory and must be released separately
-timer = new HashedWheelTimer();
-try {
-  pipelineFact = new HttpPipelineFactory(conf, timer);
-} catch (Exception ex) {
-  throw new RuntimeException(ex);
-}
-bootstrap.setPipelineFactory(pipelineFact);
-bootstrap.setOption("backlog", NetUtil.SOMAXCONN);
+ServerBootstrap bootstrap = new ServerBootstrap()
+.channel(NioServerSocketChannel.class)
+.group(bossGroup, workerGroup)
+.localAddress(port)
+.option(ChannelOption.SO_BACKLOG, NetUtil.SOMAXCONN)
+.childOption(ChannelOption.SO_KEEPALIVE, true);
+initPipeline(bootstrap, conf);
+
 port = conf.getInt(SHUFFLE_PORT_CONFIG_KEY, DEFAULT_SHUFFLE_PORT);
-Channel ch = bootstrap.bind(new InetSocketAddress(port));
+Channel ch = bootstrap.bind().sync().channel();
 accepted.add(ch);
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
 conf.set(SHUFFLE_PORT_CONFIG_KEY, Integer.toString(port));
-pipelineFact.SHUFFLE.setPort(port);
+SHUFFLE.setPort(port);
 if (dirWatcher != null) {
   dirWatcher.start();
 }
-LOG.info("LlapShuffleHandler" + " listening on port " + port + " 
(SOMAXCONN: " + bootstrap.getOption("backlog")
-  + ")");
+LOG.info("LlapShuffleHandler listening on port {} (SOMAXCONN: {})", port, 
NetUtil.SOMAXCONN);
+  }
+
+  private void initPipeline(ServerBootstrap bootstrap, Configuration conf) 
throws Exception {
+SHUFFLE = getShuffle(conf);
+// TODO Setup SSL Shuffle

Review comment:
   I think we don't support SSL shuffle for LLAP at the moment (+ the 
comment is quite old), e.g. Cloudera's data warehouse, ssl on shuffle is 
handled transparently by the environment
   I haven't touched this part in this patch, and not even sure what's the plan 
:) that's why I simply kept this as is




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581674)
Time Spent: 1h  (was: 50m)

> LLAP ShuffleHandler: upgrade to netty4
> --
>
> Key: HIVE-24524
> URL: https://issues.apache.org/jira/browse/HIVE-24524
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Tez already has a WIP patch for upgrading its shuffle handler to netty4. 
> Netty4 is told to be a possible performance improvement compared to Netty3. 
> However, the refactor is not trivial, TEZ-4157 covers that more or less (the 
> code bases are very similar).
> Background:
> netty4 migration guideline: 
> https://netty.io/wiki/new-and-noteworthy-in-4.0.html
> articles of possible performance improvement:
> https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html
> https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/
> some other notes: Netty3 is EOL since 2016:
> https://netty.io/news/2016/06/29/3-10-6-Final.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581667
 ]

ASF GitHub Bot logged work on HIVE-24524:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 10:11
Start Date: 13/Apr/21 10:11
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1778:
URL: https://github.com/apache/hive/pull/1778#discussion_r612289762



##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java
##
@@ -71,15 +72,39 @@ public long transferTo(WritableByteChannel target, long 
position)
   throws IOException {
 if (manageOsCache && readaheadPool != null) {
   readaheadRequest = readaheadPool.readaheadStream(identifier, fd,
-  getPosition() + position, readaheadLength,
-  getPosition() + getCount(), readaheadRequest);
+  position() + position, readaheadLength,
+  position() + count(), readaheadRequest);
 }
-
+long written = 0;

Review comment:
   Shall we simplify this to:
   
   ```
   transferred = true;
   if (this.shuffleTransferToAllowed) {
 return super.transferTo(target, position);
   }
   return  customShuffleTransfer(target, position);
   ```
 
   
   

##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -339,27 +350,60 @@ private ShuffleHandler(Configuration conf) {
 
 
   public void start() throws Exception {
-ServerBootstrap bootstrap = new ServerBootstrap(selector);
-// Timer is shared across entire factory and must be released separately
-timer = new HashedWheelTimer();
-try {
-  pipelineFact = new HttpPipelineFactory(conf, timer);
-} catch (Exception ex) {
-  throw new RuntimeException(ex);
-}
-bootstrap.setPipelineFactory(pipelineFact);
-bootstrap.setOption("backlog", NetUtil.SOMAXCONN);
+ServerBootstrap bootstrap = new ServerBootstrap()
+.channel(NioServerSocketChannel.class)
+.group(bossGroup, workerGroup)
+.localAddress(port)
+.option(ChannelOption.SO_BACKLOG, NetUtil.SOMAXCONN)
+.childOption(ChannelOption.SO_KEEPALIVE, true);
+initPipeline(bootstrap, conf);
+
 port = conf.getInt(SHUFFLE_PORT_CONFIG_KEY, DEFAULT_SHUFFLE_PORT);
-Channel ch = bootstrap.bind(new InetSocketAddress(port));
+Channel ch = bootstrap.bind().sync().channel();
 accepted.add(ch);
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
 conf.set(SHUFFLE_PORT_CONFIG_KEY, Integer.toString(port));
-pipelineFact.SHUFFLE.setPort(port);
+SHUFFLE.setPort(port);
 if (dirWatcher != null) {
   dirWatcher.start();
 }
-LOG.info("LlapShuffleHandler" + " listening on port " + port + " 
(SOMAXCONN: " + bootstrap.getOption("backlog")
-  + ")");
+LOG.info("LlapShuffleHandler listening on port {} (SOMAXCONN: {})", port, 
NetUtil.SOMAXCONN);
+  }
+
+  private void initPipeline(ServerBootstrap bootstrap, Configuration conf) 
throws Exception {
+SHUFFLE = getShuffle(conf);
+// TODO Setup SSL Shuffle

Review comment:
   I know this is copy pasted from below but do we have a ticket for this?
   Is it still needed?

##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
##
@@ -1031,25 +1038,14 @@ protected ChannelFuture 
sendMapOutput(ChannelHandlerContext ctx, Channel ch,
 info.getStartOffset(), info.getPartLength(), manageOsCache, 
readaheadLength,
 readaheadPool, spillfile.getAbsolutePath(), 
 shuffleBufferSize, shuffleTransferToAllowed, 
canEvictAfterTransfer);
-writeFuture = ch.write(partition);
-writeFuture.addListener(new ChannelFutureListener() {
-// TODO error handling; distinguish IO/connection failures,
-//  attribute to appropriate spill output
-  @Override
-  public void operationComplete(ChannelFuture future) {
-if (future.isSuccess()) {
-  partition.transferSuccessful();
-}
-partition.releaseExternalResources();
-  }
-});
+writeFuture = ch.writeAndFlush(partition);

Review comment:
   This looks much cleaner with deallocate() call replacing completion 
Listeners

##
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java
##
@@ -124,39 +149,33 @@ long customShuffleTransfer(WritableByteChannel target, 
long position)
 position += trans; 
 trans = 0;
   }
-  
+
   //write data to the target
   while(byteBuffer.hasRemaining()) {
 target.write(byteBuffer);
   }
   
   

[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=581657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581657
 ]

ASF GitHub Bot logged work on HIVE-24974:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 09:43
Start Date: 13/Apr/21 09:43
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2148:
URL: https://github.com/apache/hive/pull/2148#discussion_r612295409



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.metrics;
+
+import com.google.common.cache.Cache;
+import com.google.common.cache.CacheBuilder;
+
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.Maps;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.common.metrics.common.Metrics;
+import org.apache.hadoop.hive.common.metrics.common.MetricsFactory;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.io.AcidDirectory;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+
+import org.apache.tez.common.counters.TezCounter;
+import org.apache.tez.common.counters.TezCounters;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.Comparator;
+
+import java.util.Queue;
+import java.util.TreeMap;
+import java.util.concurrent.Executors;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.PriorityBlockingQueue;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.ThreadFactory;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Collects and publishes ACID compaction related metrics.
+ */
+public class DeltaFilesMetricReporter {
+
+  private static final Logger LOG = LoggerFactory.getLogger(AcidUtils.class);
+
+  public static final String NUM_OBSOLETE_DELTAS = 
"HIVE_ACID_NUM_OBSOLETE_DELTAS";

Review comment:
   I think also part of the plan was a 3rd metric : Number of deltas where 
the size is less than x % of the base (small deltas)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581657)
Time Spent: 0.5h  (was: 20m)

> Create new metrics about the number of delta files in the ACID table
> 
>
> Key: HIVE-24974
> URL: https://issues.apache.org/jira/browse/HIVE-24974
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 2 metrics should be collected by each table/partition that exceeds some limit.
>  * Number of used deltas
>  * Number of obsolete deltas
> Both of them should be collected in AcidUtils.getAcidstate call, and only be 
> published if they reached a configurable threshold (to not pollute metrics) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24985) Create new metrics about locks

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24985?focusedWorklogId=581655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581655
 ]

ASF GitHub Bot logged work on HIVE-24985:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 09:38
Start Date: 13/Apr/21 09:38
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2158:
URL: https://github.com/apache/hive/pull/2158#discussion_r612292081



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java
##
@@ -415,60 +415,64 @@ public void testDBMetrics() throws Exception {
 String dbName = "default";
 String tblName = "dcamc";
 Table t = newTable(dbName, tblName, false);
-burnThroughTransactions(t.getDbName(), t.getTableName(), 24);
 
-// create and commit txn with non-empty txn_components
+long start = System.currentTimeMillis() - 1000L;

Review comment:
   copy -> fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581655)
Time Spent: 50m  (was: 40m)

> Create new metrics about locks
> --
>
> Key: HIVE-24985
> URL: https://issues.apache.org/jira/browse/HIVE-24985
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Basic metrics that can help investigate.
> Ideas:
> *  number of locks
> * oldest lock's age



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581647
 ]

ASF GitHub Bot logged work on HIVE-24981:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 09:07
Start Date: 13/Apr/21 09:07
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2168:
URL: https://github.com/apache/hive/pull/2168#discussion_r612270073



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigrationControlConfig.java
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.util;
+
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+public class HiveStrictManagedMigrationControlConfig {
+
+  private Map> databaseIncludeLists = new TreeMap>();
+
+  public Map> getDatabaseIncludeLists() {
+return databaseIncludeLists;
+  }
+
+  public void setDatabaseIncludeLists(Map> 
databaseIncludeLists) {
+this.databaseIncludeLists = databaseIncludeLists;
+  }
+
+  public void putAllFromConfig(HiveStrictManagedMigrationControlConfig other) {
+for (String db : other.getDatabaseIncludeLists().keySet()) {
+  List theseTables = this.databaseIncludeLists.get(db);
+  List otherTables = other.getDatabaseIncludeLists().get(db);
+  if (theseTables == null) {
+this.databaseIncludeLists.put(db, otherTables);

Review comment:
   Nevemind




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581647)
Time Spent: 40m  (was: 0.5h)

> Add control file option to HiveStrictManagedMigration for DB/table selection
> 
>
> Key: HIVE-24981
> URL: https://issues.apache.org/jira/browse/HIVE-24981
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently HiveStrictManagedMigration supports db regex and table regex 
> options that allow the user to specify what Hive entities it should deal 
> with. In cases where we have thousands of tables across thousands of DBs 
> iterating through everything takes a lot of time, while specifying a set of 
> tables/DBs with regexes is cumbersome.
> We should make it available for users to prepare control files with the lists 
> of required items to migrate and feed this to the tool. A directory path 
> pointing to these control files would be taken as a new option for HSMM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25004?focusedWorklogId=581644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581644
 ]

ASF GitHub Bot logged work on HIVE-25004:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 09:06
Start Date: 13/Apr/21 09:06
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on pull request #2170:
URL: https://github.com/apache/hive/pull/2170#issuecomment-818578373


   cc: @mustafaiman 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581644)
Time Spent: 20m  (was: 10m)

> HPL/SQL subsequent statements are failing after typing a malformed input in 
> beeline
> ---
>
> Key: HIVE-25004
> URL: https://issues.apache.org/jira/browse/HIVE-25004
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> An error signal is stuck after evaluating the first expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581641
 ]

ASF GitHub Bot logged work on HIVE-24981:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 09:06
Start Date: 13/Apr/21 09:06
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2168:
URL: https://github.com/apache/hive/pull/2168#discussion_r612268649



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigrationControlConfig.java
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.util;
+
+import java.util.List;
+import java.util.Map;
+import java.util.TreeMap;
+
+public class HiveStrictManagedMigrationControlConfig {
+
+  private Map> databaseIncludeLists = new TreeMap>();
+
+  public Map> getDatabaseIncludeLists() {
+return databaseIncludeLists;
+  }
+
+  public void setDatabaseIncludeLists(Map> 
databaseIncludeLists) {
+this.databaseIncludeLists = databaseIncludeLists;
+  }
+
+  public void putAllFromConfig(HiveStrictManagedMigrationControlConfig other) {
+for (String db : other.getDatabaseIncludeLists().keySet()) {
+  List theseTables = this.databaseIncludeLists.get(db);
+  List otherTables = other.getDatabaseIncludeLists().get(db);
+  if (theseTables == null) {
+this.databaseIncludeLists.put(db, otherTables);

Review comment:
   Maybe: putIfAbsent?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581641)
Time Spent: 0.5h  (was: 20m)

> Add control file option to HiveStrictManagedMigration for DB/table selection
> 
>
> Key: HIVE-24981
> URL: https://issues.apache.org/jira/browse/HIVE-24981
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently HiveStrictManagedMigration supports db regex and table regex 
> options that allow the user to specify what Hive entities it should deal 
> with. In cases where we have thousands of tables across thousands of DBs 
> iterating through everything takes a lot of time, while specifying a set of 
> tables/DBs with regexes is cumbersome.
> We should make it available for users to prepare control files with the lists 
> of required items to migrate and feed this to the tool. A directory path 
> pointing to these control files would be taken as a new option for HSMM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24997?focusedWorklogId=581642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581642
 ]

ASF GitHub Bot logged work on HIVE-24997:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 09:06
Start Date: 13/Apr/21 09:06
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on pull request #2166:
URL: https://github.com/apache/hive/pull/2166#issuecomment-818578215


   cc: @mustafaiman 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581642)
Time Spent: 20m  (was: 10m)

> HPL/SQL udf doesn't work in tez container mode
> --
>
> Key: HIVE-24997
> URL: https://issues.apache.org/jira/browse/HIVE-24997
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since  HIVE-24230  it assumes the UDF is evaluated on HS2 which is not true 
> in general. The SessionState is only available at compile time evaluation but 
> later on a new interpreter should be instantiated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection

2021-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581640
 ]

ASF GitHub Bot logged work on HIVE-24981:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 09:02
Start Date: 13/Apr/21 09:02
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2168:
URL: https://github.com/apache/hive/pull/2168#discussion_r612266398



##
File path: ql/src/test/resources/hsmm/hsmm_cfg_01.yaml
##
@@ -0,0 +1,9 @@
+databaseIncludeLists:

Review comment:
   It took me some time to understand that `databaseIncludeLists` controls 
the tables too.
   Maybe `migrationLists`? or something like this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581640)
Time Spent: 20m  (was: 10m)

> Add control file option to HiveStrictManagedMigration for DB/table selection
> 
>
> Key: HIVE-24981
> URL: https://issues.apache.org/jira/browse/HIVE-24981
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently HiveStrictManagedMigration supports db regex and table regex 
> options that allow the user to specify what Hive entities it should deal 
> with. In cases where we have thousands of tables across thousands of DBs 
> iterating through everything takes a lot of time, while specifying a set of 
> tables/DBs with regexes is cumbersome.
> We should make it available for users to prepare control files with the lists 
> of required items to migrate and feed this to the tool. A directory path 
> pointing to these control files would be taken as a new option for HSMM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM

2021-04-13 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25006:
-


> Commit Iceberg writes in HiveMetaHook instead of TezAM
> --
>
> Key: HIVE-25006
> URL: https://issues.apache.org/jira/browse/HIVE-25006
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. 
> This will enable us to implement insert overwrites for iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)