[jira] [Assigned] (HIVE-24933) Replication fails for transactional tables having same name as dropped non-transactional table
[ https://issues.apache.org/jira/browse/HIVE-24933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratyushotpal Madhukar reassigned HIVE-24933: - Assignee: Pratyushotpal Madhukar > Replication fails for transactional tables having same name as dropped > non-transactional table > -- > > Key: HIVE-24933 > URL: https://issues.apache.org/jira/browse/HIVE-24933 > Project: Hive > Issue Type: Bug >Reporter: Pratyushotpal Madhukar >Assignee: Pratyushotpal Madhukar >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24978) Optimise number of DROP_PARTITION events created.
[ https://issues.apache.org/jira/browse/HIVE-24978?focusedWorklogId=582202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582202 ] ASF GitHub Bot logged work on HIVE-24978: - Author: ASF GitHub Bot Created on: 14/Apr/21 05:11 Start Date: 14/Apr/21 05:11 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2154: URL: https://github.com/apache/hive/pull/2154#discussion_r612939857 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java ## @@ -4294,6 +4295,9 @@ public void testMoveOptimizationIncremental() throws IOException { @Test public void testDatabaseInJobName() throws Throwable { +// Clean up configurations +driver.getConf().set(JobContext.JOB_NAME, ""); Review comment: can this clean up be done at tear down? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582202) Time Spent: 50m (was: 40m) > Optimise number of DROP_PARTITION events created. > - > > Key: HIVE-24978 > URL: https://issues.apache.org/jira/browse/HIVE-24978 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Even for drop partition with batches, presently there is one event for every > partition, optimise to merge them, to save the number of calls to HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN
[ https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320702#comment-17320702 ] Peter Vary commented on HIVE-25011: --- CC: [~dkuzmenko] > Concurrency: Do not acquire locks for EXPLAIN > - > > Key: HIVE-25011 > URL: https://issues.apache.org/jira/browse/HIVE-25011 > Project: Hive > Issue Type: Improvement > Components: Locking, Transactions >Affects Versions: 4.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Gopal Vijayaraghavan >Priority: Major > Attachments: HIVE-25011.1.patch, HIVE-25011.2.patch > > > {code} > EXPLAIN UPDATE ... > {code} > should not be in conflict with another active ongoing UPDATE operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24969) Predicates are removed by PPD when left semi join followed by lateral view
[ https://issues.apache.org/jira/browse/HIVE-24969?focusedWorklogId=582187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582187 ] ASF GitHub Bot logged work on HIVE-24969: - Author: ASF GitHub Bot Created on: 14/Apr/21 04:19 Start Date: 14/Apr/21 04:19 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #2145: URL: https://github.com/apache/hive/pull/2145#issuecomment-819216494 @kasakrisz @maheshk114 @jcamachor any thoughts here? Thanks, Zhihua Deng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582187) Time Spent: 1h (was: 50m) > Predicates are removed by PPD when left semi join followed by lateral view > -- > > Key: HIVE-24969 > URL: https://issues.apache.org/jira/browse/HIVE-24969 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Step to reproduce: > {code:java} > select count(distinct logItem.triggerId) > from service_stat_log LATERAL VIEW explode(logItems) LogItemTable AS logItem > where logItem.dsp in ('delivery', 'ocpa') > and logItem.iswin = true > and logItem.adid in ( > select distinct adId > from ad_info > where subAccountId in (16010, 14863)); {code} > For predicates _logItem.dsp in ('delivery', 'ocpa')_ and _logItem.iswin = > true_ are removed when doing ppd: JOIN -> RS -> LVJ. The JOIN has > candicates: logitem -> [logItem.dsp in ('delivery', 'ocpa'), logItem.iswin = > true],when pushing them to the RS followed by LVJ, none of them are pushed, > the candicates of logitem are removed finally by default, which cause to the > wrong result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25012) Parsing table alias is failing if query has table properties specified
[ https://issues.apache.org/jira/browse/HIVE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25012 started by Krisztian Kasa. - > Parsing table alias is failing if query has table properties specified > -- > > Key: HIVE-25012 > URL: https://issues.apache.org/jira/browse/HIVE-25012 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code} > select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from > t1('acid.fetch.deleted.rows'='true') > join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a; > {code} > When creating Join RelNode the aliases are used to lookup left and right > input RelNodes. Aliases are extracted from the AST subtree of the left and > right inputs of the join AST node. In case of a table reference: > {code} > TOK_TABREF >TOK_TABNAME > t1 >TOK_TABLEPROPERTIES > TOK_TABLEPROPLIST > TOK_TABLEPROPERTY > 'acid.fetch.deleted.rows' > 'true' > {code} > Prior HIVE-24854 queries mentioned above failed because existing solution was > not expect TOK_TABLEPROPERTIES. > The goal of this patch is to parse TOK_TABREF properly using existing > solution also used in SemanticAnalyser.doPhase1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25012) Parsing table alias is failing if query has table properties specified
[ https://issues.apache.org/jira/browse/HIVE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-25012: -- Fix Version/s: 4.0.0 > Parsing table alias is failing if query has table properties specified > -- > > Key: HIVE-25012 > URL: https://issues.apache.org/jira/browse/HIVE-25012 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {code} > select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from > t1('acid.fetch.deleted.rows'='true') > join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a; > {code} > When creating Join RelNode the aliases are used to lookup left and right > input RelNodes. Aliases are extracted from the AST subtree of the left and > right inputs of the join AST node. In case of a table reference: > {code} > TOK_TABREF >TOK_TABNAME > t1 >TOK_TABLEPROPERTIES > TOK_TABLEPROPLIST > TOK_TABLEPROPERTY > 'acid.fetch.deleted.rows' > 'true' > {code} > Prior HIVE-24854 queries mentioned above failed because existing solution was > not expect TOK_TABLEPROPERTIES. > The goal of this patch is to parse TOK_TABREF properly using existing > solution also used in SemanticAnalyser.doPhase1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25012) Parsing table alias is failing if query has table properties specified
[ https://issues.apache.org/jira/browse/HIVE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25012: -- Labels: pull-request-available (was: ) > Parsing table alias is failing if query has table properties specified > -- > > Key: HIVE-25012 > URL: https://issues.apache.org/jira/browse/HIVE-25012 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code} > select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from > t1('acid.fetch.deleted.rows'='true') > join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a; > {code} > When creating Join RelNode the aliases are used to lookup left and right > input RelNodes. Aliases are extracted from the AST subtree of the left and > right inputs of the join AST node. In case of a table reference: > {code} > TOK_TABREF >TOK_TABNAME > t1 >TOK_TABLEPROPERTIES > TOK_TABLEPROPLIST > TOK_TABLEPROPERTY > 'acid.fetch.deleted.rows' > 'true' > {code} > Prior HIVE-24854 queries mentioned above failed because existing solution was > not expect TOK_TABLEPROPERTIES. > The goal of this patch is to parse TOK_TABREF properly using existing > solution also used in SemanticAnalyser.doPhase1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25012) Parsing table alias is failing if query has table properties specified
[ https://issues.apache.org/jira/browse/HIVE-25012?focusedWorklogId=582166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582166 ] ASF GitHub Bot logged work on HIVE-25012: - Author: ASF GitHub Bot Created on: 14/Apr/21 03:00 Start Date: 14/Apr/21 03:00 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #2177: URL: https://github.com/apache/hive/pull/2177 ### What changes were proposed in this pull request? Move `getSimpleTableNameBase` and `findTabRefIdxs` to allow using them from `BaseSemanticAnalyzer.getTableAlias ` ### Why are the changes needed? See jira. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=fetch_deleted_rows.q -pl itests/qtest -Pitests ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582166) Remaining Estimate: 0h Time Spent: 10m > Parsing table alias is failing if query has table properties specified > -- > > Key: HIVE-25012 > URL: https://issues.apache.org/jira/browse/HIVE-25012 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > {code} > select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from > t1('acid.fetch.deleted.rows'='true') > join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a; > {code} > When creating Join RelNode the aliases are used to lookup left and right > input RelNodes. Aliases are extracted from the AST subtree of the left and > right inputs of the join AST node. In case of a table reference: > {code} > TOK_TABREF >TOK_TABNAME > t1 >TOK_TABLEPROPERTIES > TOK_TABLEPROPLIST > TOK_TABLEPROPERTY > 'acid.fetch.deleted.rows' > 'true' > {code} > Prior HIVE-24854 queries mentioned above failed because existing solution was > not expect TOK_TABLEPROPERTIES. > The goal of this patch is to parse TOK_TABREF properly using existing > solution also used in SemanticAnalyser.doPhase1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25012) Parsing table alias is failing if query has table properties specified
[ https://issues.apache.org/jira/browse/HIVE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-25012: - > Parsing table alias is failing if query has table properties specified > -- > > Key: HIVE-25012 > URL: https://issues.apache.org/jira/browse/HIVE-25012 > Project: Hive > Issue Type: Bug > Components: CBO, Parser >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > {code} > select t1.ROW__IS__DELETED, t1.*, t2.ROW__IS__DELETED, t2.* from > t1('acid.fetch.deleted.rows'='true') > join t2('acid.fetch.deleted.rows'='true') on t1.a = t2.a; > {code} > When creating Join RelNode the aliases are used to lookup left and right > input RelNodes. Aliases are extracted from the AST subtree of the left and > right inputs of the join AST node. In case of a table reference: > {code} > TOK_TABREF >TOK_TABNAME > t1 >TOK_TABLEPROPERTIES > TOK_TABLEPROPLIST > TOK_TABLEPROPERTY > 'acid.fetch.deleted.rows' > 'true' > {code} > Prior HIVE-24854 queries mentioned above failed because existing solution was > not expect TOK_TABLEPROPERTIES. > The goal of this patch is to parse TOK_TABREF properly using existing > solution also used in SemanticAnalyser.doPhase1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations
[ https://issues.apache.org/jira/browse/HIVE-24854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-24854: --- Fix Version/s: 4.0.0 > Incremental Materialized view refresh in presence of update/delete operations > - > > Key: HIVE-24854 > URL: https://issues.apache.org/jira/browse/HIVE-24854 > Project: Hive > Issue Type: Improvement >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Current implementation of incremental Materialized can not be used if any of > the Materialized view source tables has update or delete operation since the > last rebuild. In such cases a full rebuild should be performed. > Steps to enable incremental rebuild: > 1. Introduce a new virtual column to mark a row deleted > 2. Execute the query in the view definition > 2.a. Add filter to each table scan in order to pull only the rows from each > source table which has a higher writeId than the writeId of the last rebuild > - this is already implemented by current incremental rebuild > 2.b Add row is deleted virtual column to each table scan. In join nodes if > any of the branches has a deleted row the result row is also deleted. > We should distinguish two type of view definition queries: with and without > Aggregate. > 3.a No aggregate path: > Rewrite the plan of the full rebuild to a multi insert statement with two > insert branches. One branch to insert new rows into the materialized view > table and the second one for insert deleted rows to the materialized view > delete delta. > 3.b Aggregate path: TBD > Prerequisite: > source tables haven't compacted since the last MV revuild -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24993) AssertionError when referencing ROW__ID.writeId
[ https://issues.apache.org/jira/browse/HIVE-24993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-24993. --- Resolution: Fixed > AssertionError when referencing ROW__ID.writeId > --- > > Key: HIVE-24993 > URL: https://issues.apache.org/jira/browse/HIVE-24993 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code} > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.support.concurrency=true; > create table t1(a int, b float) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a, b) values (1, 1.1); > insert into t1(a, b) values (2, 2.2); > SELECT t1.ROW__ID > FROM t1 > WHERE t1.ROW__ID.writeid > 1; > {code} > {code} > java.lang.AssertionError > at > org.apache.hadoop.hive.ql.parse.UnparseTranslator.addTranslation(UnparseTranslator.java:123) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5680) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5570) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5530) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3385) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3706) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3717) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5281) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1839) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1785) > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179) > at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1546) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:563) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12582) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Resolved] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations
[ https://issues.apache.org/jira/browse/HIVE-24854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-24854. --- Resolution: Fixed Pushed to master. Thanks [~jcamachorodriguez] for review. > Incremental Materialized view refresh in presence of update/delete operations > - > > Key: HIVE-24854 > URL: https://issues.apache.org/jira/browse/HIVE-24854 > Project: Hive > Issue Type: Improvement >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Current implementation of incremental Materialized can not be used if any of > the Materialized view source tables has update or delete operation since the > last rebuild. In such cases a full rebuild should be performed. > Steps to enable incremental rebuild: > 1. Introduce a new virtual column to mark a row deleted > 2. Execute the query in the view definition > 2.a. Add filter to each table scan in order to pull only the rows from each > source table which has a higher writeId than the writeId of the last rebuild > - this is already implemented by current incremental rebuild > 2.b Add row is deleted virtual column to each table scan. In join nodes if > any of the branches has a deleted row the result row is also deleted. > We should distinguish two type of view definition queries: with and without > Aggregate. > 3.a No aggregate path: > Rewrite the plan of the full rebuild to a multi insert statement with two > insert branches. One branch to insert new rows into the materialized view > table and the second one for insert deleted rows to the materialized view > delete delta. > 3.b Aggregate path: TBD > Prerequisite: > source tables haven't compacted since the last MV revuild -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24854) Incremental Materialized view refresh in presence of update/delete operations
[ https://issues.apache.org/jira/browse/HIVE-24854?focusedWorklogId=582153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582153 ] ASF GitHub Bot logged work on HIVE-24854: - Author: ASF GitHub Bot Created on: 14/Apr/21 02:12 Start Date: 14/Apr/21 02:12 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #2119: URL: https://github.com/apache/hive/pull/2119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582153) Time Spent: 2h 20m (was: 2h 10m) > Incremental Materialized view refresh in presence of update/delete operations > - > > Key: HIVE-24854 > URL: https://issues.apache.org/jira/browse/HIVE-24854 > Project: Hive > Issue Type: Improvement >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Current implementation of incremental Materialized can not be used if any of > the Materialized view source tables has update or delete operation since the > last rebuild. In such cases a full rebuild should be performed. > Steps to enable incremental rebuild: > 1. Introduce a new virtual column to mark a row deleted > 2. Execute the query in the view definition > 2.a. Add filter to each table scan in order to pull only the rows from each > source table which has a higher writeId than the writeId of the last rebuild > - this is already implemented by current incremental rebuild > 2.b Add row is deleted virtual column to each table scan. In join nodes if > any of the branches has a deleted row the result row is also deleted. > We should distinguish two type of view definition queries: with and without > Aggregate. > 3.a No aggregate path: > Rewrite the plan of the full rebuild to a multi insert statement with two > insert branches. One branch to insert new rows into the materialized view > table and the second one for insert deleted rows to the materialized view > delete delta. > 3.b Aggregate path: TBD > Prerequisite: > source tables haven't compacted since the last MV revuild -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24591) Move Beeline To SLF4J Simple Logger
[ https://issues.apache.org/jira/browse/HIVE-24591?focusedWorklogId=582119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582119 ] ASF GitHub Bot logged work on HIVE-24591: - Author: ASF GitHub Bot Created on: 14/Apr/21 00:18 Start Date: 14/Apr/21 00:18 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1833: URL: https://github.com/apache/hive/pull/1833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582119) Time Spent: 2h 40m (was: 2.5h) > Move Beeline To SLF4J Simple Logger > --- > > Key: HIVE-24591 > URL: https://issues.apache.org/jira/browse/HIVE-24591 > Project: Hive > Issue Type: Improvement > Components: Beeline >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > To make beeline as simple as possible, move its SLF4J logger implementation > to SLFJ-Simple logger. This will allow users to change the logging level > simply on the command line. Currently uses must create a Log4J configuration > file which is way too advance/cumbersome for a data analyst that just wants > to use SQL (and do some minor troubleshooting) > {code:none} > export HADOOP_CLIENT_OPTS="-Dorg.slf4j.simpleLogger.defaultLogLevel=debug" > beeline ... > {code} > http://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24717) Migrate to listStatusIterator in moving files
[ https://issues.apache.org/jira/browse/HIVE-24717?focusedWorklogId=582118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582118 ] ASF GitHub Bot logged work on HIVE-24717: - Author: ASF GitHub Bot Created on: 14/Apr/21 00:18 Start Date: 14/Apr/21 00:18 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1934: URL: https://github.com/apache/hive/pull/1934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582118) Time Spent: 1h (was: 50m) > Migrate to listStatusIterator in moving files > - > > Key: HIVE-24717 > URL: https://issues.apache.org/jira/browse/HIVE-24717 > Project: Hive > Issue Type: Improvement >Reporter: Mustafa İman >Assignee: Mustafa İman >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Hive.java has various calls to hdfs listStatus call when moving > files/directories around. These codepaths are used for insert overwrite > table/partition queries. > listStatus It is blocking call whereas listStatusIterator is backed by a > RemoteIterator and fetches pages in the background. Hive should take > advantage of that since Hadoop has implemented listStatusIterator for S3 > recently https://issues.apache.org/jira/browse/HADOOP-17074 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline
[ https://issues.apache.org/jira/browse/HIVE-25004?focusedWorklogId=582074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582074 ] ASF GitHub Bot logged work on HIVE-25004: - Author: ASF GitHub Bot Created on: 13/Apr/21 21:41 Start Date: 13/Apr/21 21:41 Worklog Time Spent: 10m Work Description: mustafaiman commented on pull request #2170: URL: https://github.com/apache/hive/pull/2170#issuecomment-819072492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582074) Time Spent: 0.5h (was: 20m) > HPL/SQL subsequent statements are failing after typing a malformed input in > beeline > --- > > Key: HIVE-25004 > URL: https://issues.apache.org/jira/browse/HIVE-25004 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > An error signal is stuck after evaluating the first expression. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode
[ https://issues.apache.org/jira/browse/HIVE-24997?focusedWorklogId=582073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582073 ] ASF GitHub Bot logged work on HIVE-24997: - Author: ASF GitHub Bot Created on: 13/Apr/21 21:40 Start Date: 13/Apr/21 21:40 Worklog Time Spent: 10m Work Description: mustafaiman commented on pull request #2166: URL: https://github.com/apache/hive/pull/2166#issuecomment-819072149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582073) Time Spent: 0.5h (was: 20m) > HPL/SQL udf doesn't work in tez container mode > -- > > Key: HIVE-24997 > URL: https://issues.apache.org/jira/browse/HIVE-24997 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Since HIVE-24230 it assumes the UDF is evaluated on HS2 which is not true > in general. The SessionState is only available at compile time evaluation but > later on a new interpreter should be instantiated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity
[ https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=582072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582072 ] ASF GitHub Bot logged work on HIVE-24914: - Author: ASF GitHub Bot Created on: 13/Apr/21 21:30 Start Date: 13/Apr/21 21:30 Worklog Time Spent: 10m Work Description: mustafaiman commented on pull request #2108: URL: https://github.com/apache/hive/pull/2108#issuecomment-819067487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582072) Time Spent: 1h 20m (was: 1h 10m) > Improve LLAP scheduling by only traversing hosts with capacity > -- > > Key: HIVE-24914 > URL: https://issues.apache.org/jira/browse/HIVE-24914 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > *schedulePendingTasks* on the LlapTaskScheduler currently goes through all > the pending tasks and tries to allocate them based on their Priority -- if a > priority can not be scheduled completely, we bail out as lower priorities > would not be able to get allocations either. > An optimization here could be to only walk through the nodes with capacity > (if any) ,and not all available hosts, for scheduling these tasks based on > their priority and locality preferences. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap
[ https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582068 ] ASF GitHub Bot logged work on HIVE-24472: - Author: ASF GitHub Bot Created on: 13/Apr/21 21:19 Start Date: 13/Apr/21 21:19 Worklog Time Spent: 10m Work Description: mustafaiman commented on pull request #2123: URL: https://github.com/apache/hive/pull/2123#issuecomment-819061922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582068) Time Spent: 1.5h (was: 1h 20m) > Optimize LlapTaskSchedulerService::preemptTasksFromMap > -- > > Key: HIVE-24472 > URL: https://issues.apache.org/jira/browse/HIVE-24472 > Project: Hive > Issue Type: Sub-task >Reporter: Rajesh Balamohan >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571! > speculativeTasks could possibly include node information to reduce CPU burn > in LlapTaskSchedulerService::preemptTasksFromMap > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap
[ https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582067 ] ASF GitHub Bot logged work on HIVE-24472: - Author: ASF GitHub Bot Created on: 13/Apr/21 21:17 Start Date: 13/Apr/21 21:17 Worklog Time Spent: 10m Work Description: mustafaiman commented on a change in pull request #2123: URL: https://github.com/apache/hive/pull/2123#discussion_r612782613 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -429,6 +437,11 @@ public LlapTaskSchedulerService(TaskSchedulerContext taskSchedulerContext, Clock delayedTaskSchedulerExecutor = MoreExecutors.listeningDecorator(delayedTaskSchedulerExecutorRaw); +ExecutorService preemptTaskSchedulerExecutorRaw = Executors.newFixedThreadPool(1, Review comment: I checked that too and got confused. LlapTaskScheduler does the work of finding preemption candidates etc. even though preemption cannot occur in the end. Also, LlapTaskScheduler marks tasks as preempted and updates preemption stats eventhough nothing is preempted because of LLAP_DAEMON_TASK_SCHEDULER_ENABLE_PREEMPTION is false. Am I understanding this correctly? This is not the problem of this patch obviously. I am just asking to understand. I'll +1 this regardless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582067) Time Spent: 1h 20m (was: 1h 10m) > Optimize LlapTaskSchedulerService::preemptTasksFromMap > -- > > Key: HIVE-24472 > URL: https://issues.apache.org/jira/browse/HIVE-24472 > Project: Hive > Issue Type: Sub-task >Reporter: Rajesh Balamohan >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571! > speculativeTasks could possibly include node information to reduce CPU burn > in LlapTaskSchedulerService::preemptTasksFromMap > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN
[ https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal Vijayaraghavan updated HIVE-25011: Attachment: HIVE-25011.2.patch > Concurrency: Do not acquire locks for EXPLAIN > - > > Key: HIVE-25011 > URL: https://issues.apache.org/jira/browse/HIVE-25011 > Project: Hive > Issue Type: Improvement > Components: Locking, Transactions >Affects Versions: 4.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Gopal Vijayaraghavan >Priority: Major > Attachments: HIVE-25011.1.patch, HIVE-25011.2.patch > > > {code} > EXPLAIN UPDATE ... > {code} > should not be in conflict with another active ongoing UPDATE operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap
[ https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582060 ] ASF GitHub Bot logged work on HIVE-24472: - Author: ASF GitHub Bot Created on: 13/Apr/21 20:49 Start Date: 13/Apr/21 20:49 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2123: URL: https://github.com/apache/hive/pull/2123#discussion_r612766954 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -429,6 +437,11 @@ public LlapTaskSchedulerService(TaskSchedulerContext taskSchedulerContext, Clock delayedTaskSchedulerExecutor = MoreExecutors.listeningDecorator(delayedTaskSchedulerExecutorRaw); +ExecutorService preemptTaskSchedulerExecutorRaw = Executors.newFixedThreadPool(1, Review comment: Well, I agree but the actual LLAP preemption Conf we are using https://github.com/apache/hive/blob/61d5c641b2e414c7b7dfd92f2b402db3583507c8/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L5023 is actually targeting the TaskExecutorService within the LlapDaemon (waitQueue tasks vs Running) and not the LlapTaskSchedulingService -- in a sense this a different type of preemption and I am not sure we should just use the same conf here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582060) Time Spent: 1h 10m (was: 1h) > Optimize LlapTaskSchedulerService::preemptTasksFromMap > -- > > Key: HIVE-24472 > URL: https://issues.apache.org/jira/browse/HIVE-24472 > Project: Hive > Issue Type: Sub-task >Reporter: Rajesh Balamohan >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571! > speculativeTasks could possibly include node information to reduce CPU burn > in LlapTaskSchedulerService::preemptTasksFromMap > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN
[ https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320533#comment-17320533 ] Gopal Vijayaraghavan commented on HIVE-25011: - {code} /** * Find whether we should execute the current query due to explain * @return true if the query needs to be executed, false if not */ public boolean isExplainSkipExecution() { return (explainConfig != null && explainConfig.getAnalyze() != AnalyzeState.RUNNING); } {code} Looks like the comment is actually wrong on the "return true" > Concurrency: Do not acquire locks for EXPLAIN > - > > Key: HIVE-25011 > URL: https://issues.apache.org/jira/browse/HIVE-25011 > Project: Hive > Issue Type: Improvement > Components: Locking, Transactions >Affects Versions: 4.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Gopal Vijayaraghavan >Priority: Major > Attachments: HIVE-25011.1.patch > > > {code} > EXPLAIN UPDATE ... > {code} > should not be in conflict with another active ongoing UPDATE operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN
[ https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal Vijayaraghavan reassigned HIVE-25011: --- Assignee: Gopal Vijayaraghavan > Concurrency: Do not acquire locks for EXPLAIN > - > > Key: HIVE-25011 > URL: https://issues.apache.org/jira/browse/HIVE-25011 > Project: Hive > Issue Type: Improvement > Components: Locking, Transactions >Affects Versions: 4.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Gopal Vijayaraghavan >Priority: Major > Attachments: HIVE-25011.1.patch > > > {code} > EXPLAIN UPDATE ... > {code} > should not be in conflict with another active ongoing UPDATE operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN
[ https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal Vijayaraghavan updated HIVE-25011: Status: Patch Available (was: Open) > Concurrency: Do not acquire locks for EXPLAIN > - > > Key: HIVE-25011 > URL: https://issues.apache.org/jira/browse/HIVE-25011 > Project: Hive > Issue Type: Improvement > Components: Locking, Transactions >Affects Versions: 4.0.0 >Reporter: Gopal Vijayaraghavan >Priority: Major > Attachments: HIVE-25011.1.patch > > > {code} > EXPLAIN UPDATE ... > {code} > should not be in conflict with another active ongoing UPDATE operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN
[ https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal Vijayaraghavan updated HIVE-25011: Attachment: HIVE-25011.1.patch > Concurrency: Do not acquire locks for EXPLAIN > - > > Key: HIVE-25011 > URL: https://issues.apache.org/jira/browse/HIVE-25011 > Project: Hive > Issue Type: Improvement > Components: Locking, Transactions >Affects Versions: 4.0.0 >Reporter: Gopal Vijayaraghavan >Priority: Major > Attachments: HIVE-25011.1.patch > > > {code} > EXPLAIN UPDATE ... > {code} > should not be in conflict with another active ongoing UPDATE operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap
[ https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582055 ] ASF GitHub Bot logged work on HIVE-24472: - Author: ASF GitHub Bot Created on: 13/Apr/21 20:34 Start Date: 13/Apr/21 20:34 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2123: URL: https://github.com/apache/hive/pull/2123#discussion_r612758022 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -2324,7 +2278,114 @@ private void maybeAddToDelayedTaskQueue(TaskInfo taskInfo) { } } + private void maybeAddToHighPriorityTaskQueue(TaskInfo taskInfo) { +// Only add task if its not already in the Queue AND there no mores than HOSTS tasks there already +// as we are performing up to HOSTS preemptions at a time +if (!taskInfo.isInHighPriorityQueue() && highPriorityTaskQueue.size() < activeInstances.size()) { + taskInfo.setInHighPriorityQueue(true); + highPriorityTaskQueue.add(taskInfo); +} + } + // -- Inner classes defined after this point -- + class PreemptionSchedulerCallable implements Callable { +private final AtomicBoolean isShutdown = new AtomicBoolean(false); + +@Override +public Void call() { + while (!isShutdown.get() && !Thread.currentThread().isInterrupted()) { +try { + TaskInfo taskInfo = getNextTask(); + // Tasks can exist in the queue even after they have been scheduled. + // Process task Preemption only if the task is still in PENDING state. + processTaskPreemption(taskInfo); + +} catch (InterruptedException e) { + if (isShutdown.get()) { +LOG.info("PreemptTaskScheduler thread interrupted after shutdown"); +break; + } else { +LOG.warn("PreemptTaskScheduler thread interrupted before being shutdown"); +throw new RuntimeException("PreemptTaskScheduler thread interrupted without being shutdown", e); + } +} + } + return null; +} + +private void processTaskPreemption(TaskInfo taskInfo) { + if (shouldAttemptTask(taskInfo) && tryTaskPreemption(taskInfo)) { +trySchedulingPendingTasks(); + } + // Enables scheduler to reAdd task in Queue if needed + taskInfo.setInHighPriorityQueue(false); +} + +private boolean tryTaskPreemption(TaskInfo taskInfo) { + // Find a lower priority task that can be preempted on a particular host. + // ONLY if there's no pending preemptions on that host to avoid preempting twice for a task. + Set potentialHosts = null; // null => preempt on any host. + readLock.lock(); + try { +// Protect against a bad location being requested. +if (taskInfo.requestedHosts != null && taskInfo.requestedHosts.length != 0) { + potentialHosts = Sets.newHashSet(taskInfo.requestedHosts); +} +if (potentialHosts != null) { + // Preempt on specific host + boolean shouldPreempt = true; + for (String host : potentialHosts) { +// Preempt only if there are no pending preemptions on the same host +// When the preemption registers, the request at the highest priority will be given the slot, +// even if the initial preemption was caused by some other task. +// TODO Maybe register which task the preemption was for, to avoid a bad non-local allocation. +MutableInt pendingHostPreemptions = pendingPreemptionsPerHost.get(host); +if (pendingHostPreemptions != null && pendingHostPreemptions.intValue() > 0) { + shouldPreempt = false; + LOG.debug("No preempt candidate for task={}. Found an existing preemption request on host={}, pendingPreemptionCount={}", + taskInfo.task, host, pendingHostPreemptions.intValue()); + break; +} + } + + if (!shouldPreempt) { +LOG.debug("No preempt candidate for {} on potential hosts={}. An existing preemption request exists", +taskInfo.task, potentialHosts); +return false; + } +} else { + // Unknown requested host -- Request for a preemption if there's none pending. If a single preemption is pending, + // and this is the next task to be assigned, it will be assigned once that slot becomes available. + if (pendingPreemptions.get() != 0) { +LOG.debug("Skipping preempt candidate since there are {} pending preemption request. For task={}", +pendingPreemptions.get(), taskInfo); +return false; + } +} + +LOG.debug("Attempting preempt candidate for task={}, priority={} on
[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap
[ https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582054 ] ASF GitHub Bot logged work on HIVE-24472: - Author: ASF GitHub Bot Created on: 13/Apr/21 20:34 Start Date: 13/Apr/21 20:34 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2123: URL: https://github.com/apache/hive/pull/2123#discussion_r612757788 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -3049,7 +3131,7 @@ boolean isUpdateInProgress() { return isPendingUpdate; } -TezTaskAttemptID getAttemptId() { +synchronized TezTaskAttemptID getAttemptId() { Review comment: Ack removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582054) Time Spent: 50m (was: 40m) > Optimize LlapTaskSchedulerService::preemptTasksFromMap > -- > > Key: HIVE-24472 > URL: https://issues.apache.org/jira/browse/HIVE-24472 > Project: Hive > Issue Type: Sub-task >Reporter: Rajesh Balamohan >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png > > Time Spent: 50m > Remaining Estimate: 0h > > !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571! > speculativeTasks could possibly include node information to reduce CPU burn > in LlapTaskSchedulerService::preemptTasksFromMap > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24472) Optimize LlapTaskSchedulerService::preemptTasksFromMap
[ https://issues.apache.org/jira/browse/HIVE-24472?focusedWorklogId=582052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582052 ] ASF GitHub Bot logged work on HIVE-24472: - Author: ASF GitHub Bot Created on: 13/Apr/21 20:33 Start Date: 13/Apr/21 20:33 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2123: URL: https://github.com/apache/hive/pull/2123#discussion_r612757340 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -1954,6 +1911,37 @@ protected void schedulePendingTasks() throws InterruptedException { break; } } + // Finally take care of preemption requests that can unblock higher-pri tasks. + // This removes preemptable tasks from the runningList and sends out a preempt request to the system. + // Subsequent tasks will be scheduled once the de-allocate request for the preempted task is processed. + while (!preemptionCandidates.isEmpty()) { +TaskInfo toPreempt = preemptionCandidates.take(); +// 1. task has not terminated +if (toPreempt.isGuaranteed != null) { + String host = toPreempt.getAssignedNode().getHost(); + // 2. is currently assigned 3. no preemption pending on that Host + if (toPreempt.getState() == TaskInfo.State.ASSIGNED && + (pendingPreemptionsPerHost.get(host) == null || pendingPreemptionsPerHost.get(host).intValue() == 0)) { +LOG.debug("Preempting task took {} ms {}", (clock.getTime() - toPreempt.getPreemptedTime()), toPreempt); Review comment: Left it mostly to see how fast Preemption messages are propagated but its not super useful agree -- removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582052) Time Spent: 40m (was: 0.5h) > Optimize LlapTaskSchedulerService::preemptTasksFromMap > -- > > Key: HIVE-24472 > URL: https://issues.apache.org/jira/browse/HIVE-24472 > Project: Hive > Issue Type: Sub-task >Reporter: Rajesh Balamohan >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-12-03 at 12.13.03 PM.png > > Time Spent: 40m > Remaining Estimate: 0h > > !Screenshot 2020-12-03 at 12.13.03 PM.png|width=1063,height=571! > speculativeTasks could possibly include node information to reduce CPU burn > in LlapTaskSchedulerService::preemptTasksFromMap > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity
[ https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=582047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582047 ] ASF GitHub Bot logged work on HIVE-24914: - Author: ASF GitHub Bot Created on: 13/Apr/21 20:29 Start Date: 13/Apr/21 20:29 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2108: URL: https://github.com/apache/hive/pull/2108#discussion_r612754974 ## File path: llap-tez/src/test/org/apache/hadoop/hive/llap/tezplugins/TestLlapTaskSchedulerService.java ## @@ -1115,7 +1116,7 @@ public void testHostPreferenceMissesConsistentPartialAlive() throws IOException, // 3rd task requested host3, got host1 since host3 is dead and host4 is full assertEquals(HOST1, argumentCaptor2.getAllValues().get(2).getNodeId().getHost()); - verify(tsWrapper.mockServiceInstanceSet, times(2)).getAllInstancesOrdered(true); + verify(tsWrapper.mockServiceInstanceSet, atLeast(2)).getAllInstancesOrdered(true); Review comment: Good catch, the main reason this was converted to atLeast is that before: getAllInstancesOrdered was only called when a Task had to rollover to the next node when that was disabled as in the test above. However now, on every scheduler call we get the Alive hosts in-order (using getAllInstancesOrdered when using consistent hashing) -- the problem here is we dont know how many time the scheduling loop will be called when the test finishes -- thus changed to at least (2) needed for the requests to make progress. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582047) Time Spent: 1h 10m (was: 1h) > Improve LLAP scheduling by only traversing hosts with capacity > -- > > Key: HIVE-24914 > URL: https://issues.apache.org/jira/browse/HIVE-24914 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > *schedulePendingTasks* on the LlapTaskScheduler currently goes through all > the pending tasks and tries to allocate them based on their Priority -- if a > priority can not be scheduled completely, we bail out as lower priorities > would not be able to get allocations either. > An optimization here could be to only walk through the nodes with capacity > (if any) ,and not all available hosts, for scheduling these tasks based on > their priority and locality preferences. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity
[ https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=582041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582041 ] ASF GitHub Bot logged work on HIVE-24914: - Author: ASF GitHub Bot Created on: 13/Apr/21 20:15 Start Date: 13/Apr/21 20:15 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2108: URL: https://github.com/apache/hive/pull/2108#discussion_r612746804 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -1536,20 +1492,24 @@ private SelectHostResult selectHost(TaskInfo request) { } // requested host is still alive but cannot accept task, pick the next available host in consistent order - for (int i = 0; i < allNodes.size(); i++) { -NodeInfo nodeInfo = allNodes.get((i + requestedHostIdx + 1) % allNodes.size()); -// next node in consistent order died or does not have free slots, rollover to next -if (nodeInfo == null || !nodeInfo.canAcceptTask()) { - continue; -} else { - if (LOG.isDebugEnabled()) { -LOG.debug("Assigning {} in consistent order when looking for first requested host, from #hosts={}," -+ " requestedHosts={}", nodeInfo.toShortString(), allNodes.size(), - ((requestedHosts == null || requestedHosts.length == 0) ? "null" : -requestedHostsDebugStr)); + if (!activeNodesWithFreeSlots.isEmpty()) { +NodeInfo nextSlot = null; +boolean found = false; +for (Entry> entry : availableHostMap.entrySet()) { + if (found && !entry.getValue().isEmpty()) { +nextSlot = entry.getValue().iterator().next(); +break; } - return new SelectHostResult(nodeInfo); + if (entry.getKey().equals(firstRequestedHost)) found = true; +} +// rollover +if (nextSlot == null) nextSlot = activeNodesWithFreeSlots.stream().findFirst().get(); +if (LOG.isDebugEnabled()) { + LOG.debug("Assigning {} in consistent order when looking for first requested host, from #hosts={}," Review comment: Sure fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582041) Time Spent: 1h (was: 50m) > Improve LLAP scheduling by only traversing hosts with capacity > -- > > Key: HIVE-24914 > URL: https://issues.apache.org/jira/browse/HIVE-24914 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > *schedulePendingTasks* on the LlapTaskScheduler currently goes through all > the pending tasks and tries to allocate them based on their Priority -- if a > priority can not be scheduled completely, we bail out as lower priorities > would not be able to get allocations either. > An optimization here could be to only walk through the nodes with capacity > (if any) ,and not all available hosts, for scheduling these tasks based on > their priority and locality preferences. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24914) Improve LLAP scheduling by only traversing hosts with capacity
[ https://issues.apache.org/jira/browse/HIVE-24914?focusedWorklogId=582039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582039 ] ASF GitHub Bot logged work on HIVE-24914: - Author: ASF GitHub Bot Created on: 13/Apr/21 20:12 Start Date: 13/Apr/21 20:12 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2108: URL: https://github.com/apache/hive/pull/2108#discussion_r612744867 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -1816,23 +1776,90 @@ private static boolean removeFromRunningTaskMap(TreeMap>> getResourceAvailability() { +int memory = 0; +int vcores = 0; +int numInstancesFound = 0; +Map> availableHostMap; +readLock.lock(); +try { + // maintain insertion order (needed for Next slot in locality miss) + availableHostMap = new LinkedHashMap<>(instanceToNodeMap.size()); + Collection instances = consistentSplits ? + // might also include Inactive instances + activeInstances.getAllInstancesOrdered(true): + // if consistent splits are NOT used we don't need the ordering as there will be no cache benefit anyways + activeInstances.getAll(); + boolean foundSlot = false; + for (LlapServiceInstance inst : instances) { +NodeInfo nodeInfo = instanceToNodeMap.get(inst.getWorkerIdentity()); +if (nodeInfo != null) { + List hostList = availableHostMap.get(nodeInfo.getHost()); + if (hostList == null) { +hostList = new ArrayList<>(); +availableHostMap.put(nodeInfo.getHost(), hostList); + } + if (!(inst instanceof InactiveServiceInstance)) { +Resource r = inst.getResource(); +memory += r.getMemory(); +vcores += r.getVirtualCores(); +numInstancesFound++; +// Only add to List Nodes with available resources +// Hosts, however, exist even for nodes that do not currently have resources +if (nodeInfo.canAcceptTask()) { + foundSlot = true; + hostList.add(nodeInfo); +} + } +} + } + // isClusterCapacityFull will be set to false on every trySchedulingPendingTasks call + // set it false here to bail out early when we know there are no resources available. + if (!foundSlot) { +isClusterCapacityFull.set(true); + } +} finally { + readLock.unlock(); +} +if (LOG.isDebugEnabled()) { + LOG.debug("ResourceAvail: numInstancesFound={}, totalMem={}, totalVcores={} availableHosts: {}", Review comment: sure fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582039) Time Spent: 50m (was: 40m) > Improve LLAP scheduling by only traversing hosts with capacity > -- > > Key: HIVE-24914 > URL: https://issues.apache.org/jira/browse/HIVE-24914 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > *schedulePendingTasks* on the LlapTaskScheduler currently goes through all > the pending tasks and tries to allocate them based on their Priority -- if a > priority can not be scheduled completely, we bail out as lower priorities > would not be able to get allocations either. > An optimization here could be to only walk through the nodes with capacity > (if any) ,and not all available hosts, for scheduling these tasks based on > their priority and locality preferences. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24947) Casting exception when reading vectorized parquet file for insert into
[ https://issues.apache.org/jira/browse/HIVE-24947?focusedWorklogId=582037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-582037 ] ASF GitHub Bot logged work on HIVE-24947: - Author: ASF GitHub Bot Created on: 13/Apr/21 20:09 Start Date: 13/Apr/21 20:09 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #2176: URL: https://github.com/apache/hive/pull/2176 ### What changes were proposed in this pull request? Make sure Parquet values are decoded on the fly are each Page can decide to encode values or not. As a result we might end up with a VRB where half the values are encoded and the rest are not! ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? TODO -- add q.test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 582037) Remaining Estimate: 0h Time Spent: 10m > Casting exception when reading vectorized parquet file for insert into > -- > > Key: HIVE-24947 > URL: https://issues.apache.org/jira/browse/HIVE-24947 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Marton Bod >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We have two parquet tables (target and source). > Upon running the query: > {code:java} > set hive.vectorized.execution.enabled=true; > insert into target2 partition(part_col_1, part_col_2) select * from > source;{code} > The following exception is thrown: > {code:java} > Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to > [B > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.fillColumnVector(VectorizedListColumnReader.java:308) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:342) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:91) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:433) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:376) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:99) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > ... 24 more > {code} > The same runs without problems when vectorization is turned off. > cc [~nareshpr] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24947) Casting exception when reading vectorized parquet file for insert into
[ https://issues.apache.org/jira/browse/HIVE-24947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24947: -- Labels: pull-request-available (was: ) > Casting exception when reading vectorized parquet file for insert into > -- > > Key: HIVE-24947 > URL: https://issues.apache.org/jira/browse/HIVE-24947 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We have two parquet tables (target and source). > Upon running the query: > {code:java} > set hive.vectorized.execution.enabled=true; > insert into target2 partition(part_col_1, part_col_2) select * from > source;{code} > The following exception is thrown: > {code:java} > Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to > [B > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.fillColumnVector(VectorizedListColumnReader.java:308) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:342) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:91) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:433) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:376) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:99) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > ... 24 more > {code} > The same runs without problems when vectorization is turned off. > cc [~nareshpr] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24947) Casting exception when reading vectorized parquet file for insert into
[ https://issues.apache.org/jira/browse/HIVE-24947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-24947: - Assignee: Panagiotis Garefalakis > Casting exception when reading vectorized parquet file for insert into > -- > > Key: HIVE-24947 > URL: https://issues.apache.org/jira/browse/HIVE-24947 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Marton Bod >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We have two parquet tables (target and source). > Upon running the query: > {code:java} > set hive.vectorized.execution.enabled=true; > insert into target2 partition(part_col_1, part_col_2) select * from > source;{code} > The following exception is thrown: > {code:java} > Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to > [B > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.fillColumnVector(VectorizedListColumnReader.java:308) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.convertValueListToListColumnVector(VectorizedListColumnReader.java:342) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedListColumnReader.readBatch(VectorizedListColumnReader.java:91) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:433) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:376) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:99) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > ... 24 more > {code} > The same runs without problems when vectorization is turned off. > cc [~nareshpr] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25011) Concurrency: Do not acquire locks for EXPLAIN
[ https://issues.apache.org/jira/browse/HIVE-25011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal Vijayaraghavan updated HIVE-25011: Description: {code} EXPLAIN UPDATE ... {code} should not be in conflict with another active ongoing UPDATE operation. was: {code} EXPLAIN UPDATE ... {code} should be in conflict with another active ongoing UPDATE operation. > Concurrency: Do not acquire locks for EXPLAIN > - > > Key: HIVE-25011 > URL: https://issues.apache.org/jira/browse/HIVE-25011 > Project: Hive > Issue Type: Improvement > Components: Locking, Transactions >Affects Versions: 4.0.0 >Reporter: Gopal Vijayaraghavan >Priority: Major > > {code} > EXPLAIN UPDATE ... > {code} > should not be in conflict with another active ongoing UPDATE operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25010) Create qtest-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér reassigned HIVE-25010: > Create qtest-iceberg module > --- > > Key: HIVE-25010 > URL: https://issues.apache.org/jira/browse/HIVE-25010 > Project: Hive > Issue Type: Test >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > > We should create a qtest-iceberg module under itests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty
[ https://issues.apache.org/jira/browse/HIVE-25009?focusedWorklogId=581988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581988 ] ASF GitHub Bot logged work on HIVE-25009: - Author: ASF GitHub Bot Created on: 13/Apr/21 19:12 Start Date: 13/Apr/21 19:12 Worklog Time Spent: 10m Work Description: asinkovits opened a new pull request #2175: URL: https://github.com/apache/hive/pull/2175 …PE if the COMPACTION_QUEUE is empty ### What changes were proposed in this pull request? Null check ### Why are the changes needed? Version check can cause NPE if the queue is empty. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual tests were conducted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581988) Remaining Estimate: 0h Time Spent: 10m > Compaction worker and initiator version check can cause NPE if the > COMPACTION_QUEUE is empty > > > Key: HIVE-25009 > URL: https://issues.apache.org/jira/browse/HIVE-25009 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty
[ https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25009: -- Labels: pull-request-available (was: ) > Compaction worker and initiator version check can cause NPE if the > COMPACTION_QUEUE is empty > > > Key: HIVE-25009 > URL: https://issues.apache.org/jira/browse/HIVE-25009 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty
[ https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-25009: --- Affects Version/s: 4.0.0 > Compaction worker and initiator version check can cause NPE if the > COMPACTION_QUEUE is empty > > > Key: HIVE-25009 > URL: https://issues.apache.org/jira/browse/HIVE-25009 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty
[ https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25009 started by Antal Sinkovits. -- > Compaction worker and initiator version check can cause NPE if the > COMPACTION_QUEUE is empty > > > Key: HIVE-25009 > URL: https://issues.apache.org/jira/browse/HIVE-25009 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty
[ https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits reassigned HIVE-25009: -- > Compaction worker and initiator version check can cause NPE if the > COMPACTION_QUEUE is empty > > > Key: HIVE-25009 > URL: https://issues.apache.org/jira/browse/HIVE-25009 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty
[ https://issues.apache.org/jira/browse/HIVE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-25009: --- Component/s: Transactions > Compaction worker and initiator version check can cause NPE if the > COMPACTION_QUEUE is empty > > > Key: HIVE-25009 > URL: https://issues.apache.org/jira/browse/HIVE-25009 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581961 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 18:41 Start Date: 13/Apr/21 18:41 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612691469 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service +${project.version} + + +org.apache.hive +hive-exec + + + + +org.apache.hive +hive-standalone-metastore-server +tests +${project.version} + + +org.apache.hive +hive-iceberg-catalog +tests +${project.version} + + + +org.apache.avro +avro Review comment: I think avro is not just a test dependency - can move this above the comment? -- This is an automated message from the Apache Git Service. To respond to
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581960 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 18:40 Start Date: 13/Apr/21 18:40 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612690771 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server Review comment: I think we only used the `metastore-common` dependency - do we need to declare the server as well? it's listed below with the `tests` classifier already -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581960) Time Spent: 3h (was: 2h 50m) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} > and subsequent iceberg modules under this module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581959 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 18:39 Start Date: 13/Apr/21 18:39 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612690167 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service Review comment: hive-service and avro are not just test dependencies -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581959) Time Spent: 2h 50m (was: 2h 40m) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581958 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 18:38 Start Date: 13/Apr/21 18:38 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612689560 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service +${project.version} + + +org.apache.hive +hive-exec + + + + +org.apache.hive +hive-standalone-metastore-server +tests +${project.version} + + +org.apache.hive +hive-iceberg-catalog +tests +${project.version} + + + +org.apache.avro +avro +${iceberg.avro.version} + + +org.apache.orc +orc-core Review comment: I think we're getting this via
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581949 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 18:30 Start Date: 13/Apr/21 18:30 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612684745 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service +${project.version} + + +org.apache.hive +hive-exec + + + + +org.apache.hive +hive-standalone-metastore-server +tests +${project.version} + + +org.apache.hive +hive-iceberg-catalog +tests +${project.version} + + + +org.apache.avro +avro +${iceberg.avro.version} + + +org.apache.orc +orc-core +${orc.version} + +
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581933 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 17:56 Start Date: 13/Apr/21 17:56 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612662787 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service +${project.version} + + +org.apache.hive +hive-exec + + + + +org.apache.hive +hive-standalone-metastore-server +tests +${project.version} + + +org.apache.hive +hive-iceberg-catalog +tests +${project.version} + + + +org.apache.avro +avro +${iceberg.avro.version} + + +org.apache.orc +orc-core +${orc.version} + + +
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581858 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 15:38 Start Date: 13/Apr/21 15:38 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612561610 ## File path: pom.xml ## @@ -64,8 +64,7 @@ standalone-metastore upgrade-acid kafka-handler -iceberg-handler -iceberg-catalog +iceberg Review comment: Great, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581858) Time Spent: 2h 10m (was: 2h) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} > and subsequent iceberg modules under this module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581856 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 15:32 Start Date: 13/Apr/21 15:32 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612556987 ## File path: pom.xml ## @@ -64,8 +64,7 @@ standalone-metastore upgrade-acid kafka-handler -iceberg-handler -iceberg-catalog +iceberg Review comment: Applied my other patches so there is a new class in `patched-iceberg-api` (`CommitStateUnknownException`) Run the packaging and checked the `packaging/target/.../lib` and it still contains the libs: ``` $ cd packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/ $ ls *iceberg* hive-iceberg-catalog-4.0.0-SNAPSHOT.jar hive-iceberg-handler-4.0.0-SNAPSHOT.jar ``` The new class is in the jar: ``` $ zip -sf hive-iceberg-handler-4.0.0-SNAPSHOT.jar |grep CommitStateUnknownException org/apache/hive/iceberg/org/apache/iceberg/exceptions/CommitStateUnknownException.class ``` So this seems ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581856) Time Spent: 2h (was: 1h 50m) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} > and subsequent iceberg modules under this module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=581853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581853 ] ASF GitHub Bot logged work on HIVE-24928: - Author: ASF GitHub Bot Created on: 13/Apr/21 15:27 Start Date: 13/Apr/21 15:27 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2111: URL: https://github.com/apache/hive/pull/2111#discussion_r612552854 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java ## @@ -422,7 +422,7 @@ private String extractTableFullName(StatsTask tsk) throws SemanticException { TableSpec tableSpec = new TableSpec(table, partitions); tableScan.getConf().getTableMetadata().setTableSpec(tableSpec); -if (BasicStatsNoJobTask.canUseFooterScan(table, inputFormat)) { +if (BasicStatsNoJobTask.canUseColumnStats(table, inputFormat)) { Review comment: Right, fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581853) Time Spent: 6h 10m (was: 6h) > In case of non-native tables use basic statistics from HiveStorageHandler > - > > Key: HIVE-24928 > URL: https://issues.apache.org/jira/browse/HIVE-24928 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE > ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by > the BasicStatsTask class. This class tries to estimate the statistics by > scanning the directory of the table. > In the case of non-native tables (iceberg, hbase), the table directory might > contain metadata files as well, which would be counted by the BasicStatsTask > when calculating basic stats. > Instead of having this logic, the HiveStorageHandler implementation should > provide basic statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=581849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581849 ] ASF GitHub Bot logged work on HIVE-24928: - Author: ASF GitHub Bot Created on: 13/Apr/21 15:26 Start Date: 13/Apr/21 15:26 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2111: URL: https://github.com/apache/hive/pull/2111#discussion_r612551277 ## File path: iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java ## @@ -153,6 +156,37 @@ public DecomposedPredicate decomposePredicate(JobConf jobConf, Deserializer dese return predicate; } + @Override + public boolean canProvideBasicStatistics() { +return true; + } + + @Override + public Map getBasicStatistics(TableDesc tableDesc) { +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +Map stats = new HashMap<>(); +if (table.currentSnapshot() != null) { + Map summary = table.currentSnapshot().summary(); + if (summary != null) { +if (summary.containsKey(SnapshotSummary.TOTAL_DATA_FILES_PROP)) { + stats.put(StatsSetupConst.NUM_FILES, summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP)); +} +if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) { + stats.put(StatsSetupConst.ROW_COUNT, summary.get(SnapshotSummary.TOTAL_RECORDS_PROP)); +} +// TODO: add TOTAL_SIZE when iceberg 0.12 is released +if (summary.containsKey("total-files-size")) { + stats.put(StatsSetupConst.TOTAL_SIZE, summary.get("total-files-size")); +} + } +} else { + stats.put(StatsSetupConst.NUM_FILES, "0"); Review comment: In the case of an empty table, the current snapshot is null. I thought setting all the basic stats to 0 is the right approach since we don't have any data. When the summary of the snapshot is not available I return an empty statistics map. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581849) Time Spent: 6h (was: 5h 50m) > In case of non-native tables use basic statistics from HiveStorageHandler > - > > Key: HIVE-24928 > URL: https://issues.apache.org/jira/browse/HIVE-24928 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 6h > Remaining Estimate: 0h > > When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE > ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by > the BasicStatsTask class. This class tries to estimate the statistics by > scanning the directory of the table. > In the case of non-native tables (iceberg, hbase), the table directory might > contain metadata files as well, which would be counted by the BasicStatsTask > when calculating basic stats. > Instead of having this logic, the HiveStorageHandler implementation should > provide basic statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=581843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581843 ] ASF GitHub Bot logged work on HIVE-24928: - Author: ASF GitHub Bot Created on: 13/Apr/21 15:19 Start Date: 13/Apr/21 15:19 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2111: URL: https://github.com/apache/hive/pull/2111#discussion_r612545094 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java ## @@ -119,16 +129,83 @@ public String getName() { return "STATS-NO-JOB"; } - static class StatItem { -Partish partish; -Map params; -Object result; + abstract static class StatCollector implements Runnable { + +protected Partish partish; +protected Object result; +protected LogHelper console; + +public static Function SIMPLE_NAME_FUNCTION = +sc -> String.format("%s#%s", sc.partish().getTable().getCompleteName(), sc.partish().getPartishType()); + +public static Function EXTRACT_RESULT_FUNCTION = sc -> (Partition) sc.result(); + +abstract Partish partish(); +abstract boolean isValid(); +abstract Object result(); +abstract void init(HiveConf conf, LogHelper console) throws IOException; + +protected String toString(Map parameters) { + return StatsSetupConst.SUPPORTED_STATS.stream().map(st -> st + "=" + parameters.get(st)) + .collect(Collectors.joining(", ")); +} } - static class FooterStatCollector implements Runnable { + static class HiveStorageHandlerStatCollector extends StatCollector { + +public HiveStorageHandlerStatCollector(Partish partish) { + this.partish = partish; +} + +@Override +public void init(HiveConf conf, LogHelper console) throws IOException { + this.console = console; +} + +@Override +public void run() { + try { +Table table = partish.getTable(); +Map parameters = partish.getPartParameters(); +TableDesc tableDesc = Utilities.getTableDesc(table); +Map basicStatistics = table.getStorageHandler().getBasicStatistics(tableDesc); Review comment: Correct, I missed that. I will provide the `partish` object which is enough to calculate the table/partition stats on StorageHandler side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581843) Time Spent: 5h 50m (was: 5h 40m) > In case of non-native tables use basic statistics from HiveStorageHandler > - > > Key: HIVE-24928 > URL: https://issues.apache.org/jira/browse/HIVE-24928 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE > ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by > the BasicStatsTask class. This class tries to estimate the statistics by > scanning the directory of the table. > In the case of non-native tables (iceberg, hbase), the table directory might > contain metadata files as well, which would be counted by the BasicStatsTask > when calculating basic stats. > Instead of having this logic, the HiveStorageHandler implementation should > provide basic statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581835 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 15:05 Start Date: 13/Apr/21 15:05 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612533431 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service +${project.version} + + +org.apache.hive +hive-exec + + + + +org.apache.hive +hive-standalone-metastore-server +tests +${project.version} + + +org.apache.hive +hive-iceberg-catalog +tests +${project.version} + + + +org.apache.avro +avro +${iceberg.avro.version} + + +org.apache.orc +orc-core +${orc.version} + +
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581830 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 15:00 Start Date: 13/Apr/21 15:00 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612529170 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service +${project.version} + + +org.apache.hive +hive-exec + + + + +org.apache.hive +hive-standalone-metastore-server +tests +${project.version} + + +org.apache.hive +hive-iceberg-catalog +tests +${project.version} + + + +org.apache.avro +avro +${iceberg.avro.version} + + +org.apache.orc +orc-core +${orc.version} + +
[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table
[ https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=581828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581828 ] ASF GitHub Bot logged work on HIVE-24974: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:54 Start Date: 13/Apr/21 14:54 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2148: URL: https://github.com/apache/hive/pull/2148#discussion_r612517973 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -3002,6 +3002,22 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false, "Enables read-only transaction classification and related optimizations"), + HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 100, +"Size of the ACID metrics cache. Only topN metrics would remain in the cache if exceeded."), + HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", "7200s", +new TimeValidator(TimeUnit.SECONDS), +"Maximum lifetime in seconds for an entry in the ACID metrics cache."), + HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval", "30s", +new TimeValidator(TimeUnit.SECONDS), +"Reporting period for ACID metrics in seconds."), + HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold", 100, +"Threshold for the number of delta files to include in the ACID metrics report."), + HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.obsolete.delta.num.threshold", 100, Review comment: Similar to above ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -3002,6 +3002,22 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false, "Enables read-only transaction classification and related optimizations"), + HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 100, +"Size of the ACID metrics cache. Only topN metrics would remain in the cache if exceeded."), + HIVE_TXN_ACID_METRICS_CACHE_DURATION("hive.txn.acid.metrics.cache.duration", "7200s", +new TimeValidator(TimeUnit.SECONDS), +"Maximum lifetime in seconds for an entry in the ACID metrics cache."), + HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL("hive.txn.acid.metrics.reporting.interval", "30s", +new TimeValidator(TimeUnit.SECONDS), +"Reporting period for ACID metrics in seconds."), + HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD("hive.txn.acid.metrics.delta.num.threshold", 100, Review comment: Isn't this: the minimum number of active delta files a table/partition must have to be included in the report ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java ## @@ -43,17 +44,18 @@ import org.apache.hadoop.hive.metastore.metrics.Metrics; import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.ql.txn.compactor.metrics.DeltaFilesMetricReporter; +import org.apache.tez.common.counters.TezCounters; import org.junit.Assert; import org.junit.Before; import org.junit.Test; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.Collections; -import java.util.HashSet; -import java.util.List; -import java.util.Objects; +import java.util.*; Review comment: I think it's not recommended to import all classes in the package ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -3002,6 +3002,22 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false, "Enables read-only transaction classification and related optimizations"), Review comment: Here I would add a comment like: Configs having to do with DeltaFilesMetricReporter, which collects lists of most recently active tables with the most number of active/obsolete deltas. ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -3002,6 +3002,22 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal HIVE_TXN_READONLY_ENABLED("hive.txn.readonly.enabled", false, "Enables read-only transaction classification and related optimizations"), + HIVE_TXN_ACID_METRICS_MAX_CACHE_SIZE("hive.txn.acid.metrics.max.cache.size", 100, +"Size of the ACID metrics cache. Only topN metrics would remain in the cache if exceeded."), +
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581827 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:53 Start Date: 13/Apr/21 14:53 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612522412 ## File path: pom.xml ## @@ -64,8 +64,7 @@ standalone-metastore upgrade-acid kafka-handler -iceberg-handler -iceberg-catalog +iceberg Review comment: That's a good question! I will check... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581827) Time Spent: 1.5h (was: 1h 20m) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} > and subsequent iceberg modules under this module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581826 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:52 Start Date: 13/Apr/21 14:52 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612521207 ## File path: pom.xml ## @@ -64,8 +64,7 @@ standalone-metastore upgrade-acid kafka-handler -iceberg-handler -iceberg-catalog +iceberg Review comment: Just to check, after this refactor, the handler and catalog jars are still packaged into the `packaging/target/.../lib` directory? How about the patched jars (core and api)? Are they correctly included into the shaded handler jar with the patched classfiles, but not included into the packaging `/lib`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581826) Time Spent: 1h 20m (was: 1h 10m) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} > and subsequent iceberg modules under this module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581825 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:52 Start Date: 13/Apr/21 14:52 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612521207 ## File path: pom.xml ## @@ -64,8 +64,7 @@ standalone-metastore upgrade-acid kafka-handler -iceberg-handler -iceberg-catalog +iceberg Review comment: Just to check, after this refactor, the handler and catalog jars are still packaged into the `packaging/target/.../lib` directory? How about the patched jars (core and api)? Are they correctly included into the shaded handler jar with the patched classfiles, but the patches jars are not included into the packaging `/lib`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581825) Time Spent: 1h 10m (was: 1h) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} > and subsequent iceberg modules under this module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581822 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:50 Start Date: 13/Apr/21 14:50 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612519866 ## File path: iceberg/patched-iceberg-core/pom.xml ## @@ -0,0 +1,80 @@ + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive-iceberg +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +org.apache.iceberg +iceberg-core +patched-${iceberg-api.version}-${parent.version} +Patched Iceberg Core + + + + + + + + + + + +../.. +.. + + + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-common +${iceberg-api.version} + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + + + + +org.apache.maven.plugins +maven-dependency-plugin + + +unpack +generate-sources + +unpack + + + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} +jar +true + ${project.build.directory}/classes + Review comment: Those will come with the individual changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581822) Time Spent: 1h (was: 50m) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} > and subsequent iceberg modules under this module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581821 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:49 Start Date: 13/Apr/21 14:49 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612518701 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service +${project.version} + + +org.apache.hive +hive-exec + + + + +org.apache.hive +hive-standalone-metastore-server +tests +${project.version} + + +org.apache.hive +hive-iceberg-catalog +tests +${project.version} + + + +org.apache.avro +avro +${iceberg.avro.version} + + +org.apache.orc +orc-core +${orc.version} + + +
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581820 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:48 Start Date: 13/Apr/21 14:48 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612517993 ## File path: iceberg/patched-iceberg-core/pom.xml ## @@ -0,0 +1,80 @@ + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive-iceberg +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +org.apache.iceberg +iceberg-core +patched-${iceberg-api.version}-${parent.version} +Patched Iceberg Core + + + + + + + + + + + +../.. +.. + + + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-common +${iceberg-api.version} + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + + + + +org.apache.maven.plugins +maven-dependency-plugin + + +unpack +generate-sources + +unpack + + + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} +jar +true + ${project.build.directory}/classes + Review comment: I think I'm missing some step here: don't we need to list the SnapshotSummary class to be replaced here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581820) Time Spent: 40m (was: 0.5h) > Move iceberg-handler under a hive-iceberg module > > > Key: HIVE-25003 > URL: https://issues.apache.org/jira/browse/HIVE-25003 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > We should create a new {{hive-iceberg}} module and put {{iceberg-handler}} > and subsequent iceberg modules under this module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25003) Move iceberg-handler under a hive-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25003?focusedWorklogId=581816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581816 ] ASF GitHub Bot logged work on HIVE-25003: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:44 Start Date: 13/Apr/21 14:44 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2169: URL: https://github.com/apache/hive/pull/2169#discussion_r612513676 ## File path: iceberg/pom.xml ## @@ -0,0 +1,325 @@ + + +http://www.w3.org/2001/XMLSchema-instance; + xmlns="http://maven.apache.org/POM/4.0.0; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + +org.apache.hive +hive +4.0.0-SNAPSHOT +../pom.xml + +4.0.0 + +hive-iceberg +4.0.0-SNAPSHOT +pom +Hive Iceberg Modules + + +.. +. +0.11.0 +4.0.2 +1.9.2 +4.0.2 + 3.1.2 +2.5.0 + + + + + +iceberg-catalog +iceberg-handler + + + + + +org.apache.iceberg +iceberg-api +${iceberg-api.version} + + +org.apache.iceberg +iceberg-core +${iceberg-api.version} + + +org.apache.iceberg +iceberg-hive-metastore +${iceberg-api.version} + + +org.apache.iceberg +iceberg-data +${iceberg-api.version} + + +org.apache.iceberg +iceberg-parquet +${iceberg-api.version} + + +org.apache.iceberg +iceberg-orc +${iceberg-api.version} + + + +org.apache.hive +hive-iceberg-catalog +${project.version} + + + +org.apache.hive +hive-exec +${project.version} + + +com.google.code.findbugs +jsr305 + + +com.google.guava +* + + +com.google.protobuf +protobuf-java + + +org.apache.avro +avro + + +org.apache.calcite.avatica +* + + +org.apache.hive +hive-llap-tez + + +org.apache.logging.log4j +* + + +org.pentaho +* + + +org.slf4j +slf4j-log4j12 + + + + +org.apache.hive +hive-serde +${project.version} + + +org.apache.hive +hive-standalone-metastore-server +${project.version} + + +org.apache.hive +hive-standalone-metastore-common +${project.version} + + + +org.apache.hadoop +hadoop-client +${hadoop.version} + + +org.apache.avro +avro + + + + + + +org.apache.hive +hive-service +${project.version} + + +org.apache.hive +hive-exec + + + + +org.apache.hive +hive-standalone-metastore-server +tests +${project.version} + + +org.apache.hive +hive-iceberg-catalog +tests +${project.version} + + + +org.apache.avro +avro +${iceberg.avro.version} + + +org.apache.orc +orc-core +${orc.version} + +
[jira] [Work logged] (HIVE-24978) Optimise number of DROP_PARTITION events created.
[ https://issues.apache.org/jira/browse/HIVE-24978?focusedWorklogId=581801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581801 ] ASF GitHub Bot logged work on HIVE-24978: - Author: ASF GitHub Bot Created on: 13/Apr/21 14:15 Start Date: 13/Apr/21 14:15 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #2154: URL: https://github.com/apache/hive/pull/2154#discussion_r611252750 ## File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/partition/drop/AlterTableDropPartitionOperation.java ## @@ -120,6 +126,12 @@ private void dropPartitions() throws HiveException { List droppedPartitions = context.getDb().dropPartitions(tablenName.getDb(), tablenName.getTable(), partitionExpressions, options); +if (isRepl) { Review comment: It does a lot of stuff below related to ``llap`` and printing to console, which I thought would be irrelevant here, so I added this check, So in case of replication, I skip the below part and return from here itself There is something like this below: `` // We have already locked the table, don't lock the partitions. DDLUtils.addIfAbsentByName(new WriteEntity(partition, WriteEntity.WriteType.DDL_NO_LOCK), context); `` This ddlUtils stuff only, I am not very sure, Let me know if you find it relevant, I will add it in my block as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581801) Time Spent: 40m (was: 0.5h) > Optimise number of DROP_PARTITION events created. > - > > Key: HIVE-24978 > URL: https://issues.apache.org/jira/browse/HIVE-24978 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Even for drop partition with batches, presently there is one event for every > partition, optimise to merge them, to save the number of calls to HMS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM
[ https://issues.apache.org/jira/browse/HIVE-25006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25006: -- Labels: pull-request-available (was: ) > Commit Iceberg writes in HiveMetaHook instead of TezAM > -- > > Key: HIVE-25006 > URL: https://issues.apache.org/jira/browse/HIVE-25006 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. > This will enable us to implement insert overwrites for iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM
[ https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=581779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581779 ] ASF GitHub Bot logged work on HIVE-25006: - Author: ASF GitHub Bot Created on: 13/Apr/21 13:39 Start Date: 13/Apr/21 13:39 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2161: URL: https://github.com/apache/hive/pull/2161#discussion_r612455851 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ## @@ -250,9 +255,32 @@ public int execute() { this.setException(new HiveException(monitor.getDiagnostics())); } -// fetch the counters try { Set statusGetOpts = EnumSet.of(StatusGetOpts.GET_COUNTERS); + // save useful commit information into session conf, e.g. for custom commit hooks + List allWork = work.getAllWork(); + boolean hasReducer = allWork.stream().map(workToVertex::get).anyMatch(v -> v.getName().startsWith("Reducer")); + for (BaseWork baseWork : allWork) { +Vertex vertex = workToVertex.get(baseWork); +if (!hasReducer || vertex.getName().startsWith("Reducer")) { + // construct the parsable job id + VertexStatus status = dagClient.getVertexStatus(vertex.getName(), statusGetOpts); + String[] jobIdParts = status.getId().split("_"); + // status.getId() returns something like: vertex_1617722404520_0001_1_00 + // this should be transformed to a parsable JobID: job_16177224045200_0001 + int vertexId = Integer.parseInt(jobIdParts[jobIdParts.length - 1]); + String jobId = String.format("job_%s%d_%s", jobIdParts[1], vertexId, jobIdParts[2]); + // prefix with table name (for multi-table inserts), if available + String tableName = Optional.ofNullable(workToConf.get(baseWork)).map(c -> c.get("name")).orElse(null); + String jobIdKey = HIVE_TEZ_COMMIT_JOB_ID + (tableName == null ? "" : "." + tableName);; + String taskCountKey = HIVE_TEZ_COMMIT_TASK_COUNT + (tableName == null ? "" : "." + tableName); + // save info into session conf + HiveConf sessionConf = SessionState.get().getConf(); + sessionConf.set(jobIdKey, jobId); + sessionConf.setInt(taskCountKey, status.getProgress().getSucceededTaskCount()); Review comment: I'll look into this in the following PR, once we've replaced the temporary listing solution with the permanent one and upgraded the Tez dependency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581779) Remaining Estimate: 0h Time Spent: 10m > Commit Iceberg writes in HiveMetaHook instead of TezAM > -- > > Key: HIVE-25006 > URL: https://issues.apache.org/jira/browse/HIVE-25006 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. > This will enable us to implement insert overwrites for iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25008) Migrate hive table data into Iceberg format.
[ https://issues.apache.org/jira/browse/HIVE-25008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér reassigned HIVE-25008: > Migrate hive table data into Iceberg format. > > > Key: HIVE-25008 > URL: https://issues.apache.org/jira/browse/HIVE-25008 > Project: Hive > Issue Type: New Feature > Components: Hive >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > > We should provide a way to migrate native hive table data files into Iceberg > format with just a simple ALTER TABLE ... SET > TBLPROPERTIES('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler') > command. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24876) Disable /longconf.jsp page on HS2 web UI for non admin users
[ https://issues.apache.org/jira/browse/HIVE-24876?focusedWorklogId=581749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581749 ] ASF GitHub Bot logged work on HIVE-24876: - Author: ASF GitHub Bot Created on: 13/Apr/21 12:57 Start Date: 13/Apr/21 12:57 Worklog Time Spent: 10m Work Description: yongzhi merged pull request #2063: URL: https://github.com/apache/hive/pull/2063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581749) Time Spent: 1h 10m (was: 1h) > Disable /longconf.jsp page on HS2 web UI for non admin users > > > Key: HIVE-24876 > URL: https://issues.apache.org/jira/browse/HIVE-24876 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > /logconf.jsp page should be disabled to the users that are not in admin > roles. Otherwise, any user can flood the log files with different log levels > that can be configured on HS2 web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24958) Create Iceberg catalog module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-24958. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the patch [~Marton Bod] and [~lpinter] for the review! > Create Iceberg catalog module in Hive > - > > Key: HIVE-24958 > URL: https://issues.apache.org/jira/browse/HIVE-24958 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > * Create a new iceberg-catalog module in Hive, with the code currently > contained in Iceberg's iceberg-hive-metastore module > * Make sure all tests pass (including static analysis and checkstyle) > * Make iceberg-handler depend on this module instead of > iceberg-hive-metastore -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24958) Create Iceberg catalog module in Hive
[ https://issues.apache.org/jira/browse/HIVE-24958?focusedWorklogId=581731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581731 ] ASF GitHub Bot logged work on HIVE-24958: - Author: ASF GitHub Bot Created on: 13/Apr/21 12:31 Start Date: 13/Apr/21 12:31 Worklog Time Spent: 10m Work Description: pvary merged pull request #2138: URL: https://github.com/apache/hive/pull/2138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581731) Time Spent: 50m (was: 40m) > Create Iceberg catalog module in Hive > - > > Key: HIVE-24958 > URL: https://issues.apache.org/jira/browse/HIVE-24958 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > * Create a new iceberg-catalog module in Hive, with the code currently > contained in Iceberg's iceberg-hive-metastore module > * Make sure all tests pass (including static analysis and checkstyle) > * Make iceberg-handler depend on this module instead of > iceberg-hive-metastore -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581698 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 11:06 Start Date: 13/Apr/21 11:06 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612347286 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent evt) Map mapOutputInfoMap = new HashMap(); - Channel ch = evt.getChannel(); - + Channel ch = ctx.channel(); // In case of KeepAlive, ensure that timeout handler does not close connection until entire // response is written (i.e, response headers + mapOutput). - ChannelPipeline pipeline = ch.getPipeline(); + ChannelPipeline pipeline = ch.pipeline(); TimeoutHandler timeoutHandler = (TimeoutHandler)pipeline.get(TIMEOUT_HANDLER); timeoutHandler.setEnabledTimeout(false); String user = userRsrc.get(jobId); - + if (keepAliveParam || connectionKeepAliveEnabled){ Review comment: Thanks Laszlo! sounds like a plan! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581698) Time Spent: 2.5h (was: 2h 20m) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25007) Implement insert overwrite for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod reassigned HIVE-25007: - > Implement insert overwrite for Iceberg tables > - > > Key: HIVE-25007 > URL: https://issues.apache.org/jira/browse/HIVE-25007 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581694 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 11:00 Start Date: 13/Apr/21 11:00 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612343543 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent evt) Map mapOutputInfoMap = new HashMap(); - Channel ch = evt.getChannel(); - + Channel ch = ctx.channel(); // In case of KeepAlive, ensure that timeout handler does not close connection until entire // response is written (i.e, response headers + mapOutput). - ChannelPipeline pipeline = ch.getPipeline(); + ChannelPipeline pipeline = ch.pipeline(); TimeoutHandler timeoutHandler = (TimeoutHandler)pipeline.get(TIMEOUT_HANDLER); timeoutHandler.setEnabledTimeout(false); String user = userRsrc.get(jobId); - + if (keepAliveParam || connectionKeepAliveEnabled){ Review comment: okay, in this case I'll have to include some unit tests here (which are part of tez codebase already) + create a simple repro to share with netty community -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581694) Time Spent: 2h 20m (was: 2h 10m) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581692 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 11:00 Start Date: 13/Apr/21 11:00 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612343543 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent evt) Map mapOutputInfoMap = new HashMap(); - Channel ch = evt.getChannel(); - + Channel ch = ctx.channel(); // In case of KeepAlive, ensure that timeout handler does not close connection until entire // response is written (i.e, response headers + mapOutput). - ChannelPipeline pipeline = ch.getPipeline(); + ChannelPipeline pipeline = ch.pipeline(); TimeoutHandler timeoutHandler = (TimeoutHandler)pipeline.get(TIMEOUT_HANDLER); timeoutHandler.setEnabledTimeout(false); String user = userRsrc.get(jobId); - + if (keepAliveParam || connectionKeepAliveEnabled){ Review comment: okay, in this case I'll have to include some unit tests here (which might be part of tez code already) + create a simple repro to share with netty community -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581692) Time Spent: 2h 10m (was: 2h) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection
[ https://issues.apache.org/jira/browse/HIVE-24981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita resolved HIVE-24981. --- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master. Thanks for the review [~pvary] > Add control file option to HiveStrictManagedMigration for DB/table selection > > > Key: HIVE-24981 > URL: https://issues.apache.org/jira/browse/HIVE-24981 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently HiveStrictManagedMigration supports db regex and table regex > options that allow the user to specify what Hive entities it should deal > with. In cases where we have thousands of tables across thousands of DBs > iterating through everything takes a lot of time, while specifying a set of > tables/DBs with regexes is cumbersome. > We should make it available for users to prepare control files with the lists > of required items to migrate and feed this to the tool. A directory path > pointing to these control files would be taken as a new option for HSMM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection
[ https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581689 ] ASF GitHub Bot logged work on HIVE-24981: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:51 Start Date: 13/Apr/21 10:51 Worklog Time Spent: 10m Work Description: szlta merged pull request #2168: URL: https://github.com/apache/hive/pull/2168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581689) Time Spent: 1h (was: 50m) > Add control file option to HiveStrictManagedMigration for DB/table selection > > > Key: HIVE-24981 > URL: https://issues.apache.org/jira/browse/HIVE-24981 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Currently HiveStrictManagedMigration supports db regex and table regex > options that allow the user to specify what Hive entities it should deal > with. In cases where we have thousands of tables across thousands of DBs > iterating through everything takes a lot of time, while specifying a set of > tables/DBs with regexes is cumbersome. > We should make it available for users to prepare control files with the lists > of required items to migrate and feed this to the tool. A directory path > pointing to these control files would be taken as a new option for HSMM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection
[ https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581687 ] ASF GitHub Bot logged work on HIVE-24981: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:49 Start Date: 13/Apr/21 10:49 Worklog Time Spent: 10m Work Description: szlta commented on a change in pull request #2168: URL: https://github.com/apache/hive/pull/2168#discussion_r612337484 ## File path: ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigrationControlConfig.java ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.util; + +import java.util.List; +import java.util.Map; +import java.util.TreeMap; + +public class HiveStrictManagedMigrationControlConfig { + + private Map> databaseIncludeLists = new TreeMap>(); + + public Map> getDatabaseIncludeLists() { +return databaseIncludeLists; + } + + public void setDatabaseIncludeLists(Map> databaseIncludeLists) { +this.databaseIncludeLists = databaseIncludeLists; + } + + public void putAllFromConfig(HiveStrictManagedMigrationControlConfig other) { +for (String db : other.getDatabaseIncludeLists().keySet()) { + List theseTables = this.databaseIncludeLists.get(db); + List otherTables = other.getDatabaseIncludeLists().get(db); + if (theseTables == null) { +this.databaseIncludeLists.put(db, otherTables); Review comment: I want to merge the lists if they're present. To do this, checking their existence is something I need to do anyway, so putIfAbsent cannot make my code more compact here unfortunately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581687) Time Spent: 50m (was: 40m) > Add control file option to HiveStrictManagedMigration for DB/table selection > > > Key: HIVE-24981 > URL: https://issues.apache.org/jira/browse/HIVE-24981 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Currently HiveStrictManagedMigration supports db regex and table regex > options that allow the user to specify what Hive entities it should deal > with. In cases where we have thousands of tables across thousands of DBs > iterating through everything takes a lot of time, while specifying a set of > tables/DBs with regexes is cumbersome. > We should make it available for users to prepare control files with the lists > of required items to migrate and feed this to the tool. A directory path > pointing to these control files would be taken as a new option for HSMM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581684 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:44 Start Date: 13/Apr/21 10:44 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612334078 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent evt) Map mapOutputInfoMap = new HashMap(); - Channel ch = evt.getChannel(); - + Channel ch = ctx.channel(); // In case of KeepAlive, ensure that timeout handler does not close connection until entire // response is written (i.e, response headers + mapOutput). - ChannelPipeline pipeline = ch.getPipeline(); + ChannelPipeline pipeline = ch.pipeline(); TimeoutHandler timeoutHandler = (TimeoutHandler)pipeline.get(TIMEOUT_HANDLER); timeoutHandler.setEnabledTimeout(false); String user = userRsrc.get(jobId); - + if (keepAliveParam || connectionKeepAliveEnabled){ Review comment: Got it, this is helpful but lets make sure this is expected from nettys' side of things before committing -- this would be helpful for the Tez change as well :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581684) Time Spent: 2h (was: 1h 50m) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581683 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:38 Start Date: 13/Apr/21 10:38 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612330581 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent evt) Map mapOutputInfoMap = new HashMap(); - Channel ch = evt.getChannel(); - + Channel ch = ctx.channel(); // In case of KeepAlive, ensure that timeout handler does not close connection until entire // response is written (i.e, response headers + mapOutput). - ChannelPipeline pipeline = ch.getPipeline(); + ChannelPipeline pipeline = ch.pipeline(); TimeoutHandler timeoutHandler = (TimeoutHandler)pipeline.get(TIMEOUT_HANDLER); timeoutHandler.setEnabledTimeout(false); String user = userRsrc.get(jobId); - + if (keepAliveParam || connectionKeepAliveEnabled){ Review comment: good catch :) this is an epic workaround for a problem that I haven't been able to figure out 100%, here are some details: https://issues.apache.org/jira/browse/TEZ-4157?focusedCommentId=17100835=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17100835 (btw: with netty3, we didn't need this) are you fine with a comment explaining this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581683) Time Spent: 1h 50m (was: 1h 40m) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581682 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:38 Start Date: 13/Apr/21 10:38 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612330581 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -797,16 +803,17 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent evt) Map mapOutputInfoMap = new HashMap(); - Channel ch = evt.getChannel(); - + Channel ch = ctx.channel(); // In case of KeepAlive, ensure that timeout handler does not close connection until entire // response is written (i.e, response headers + mapOutput). - ChannelPipeline pipeline = ch.getPipeline(); + ChannelPipeline pipeline = ch.pipeline(); TimeoutHandler timeoutHandler = (TimeoutHandler)pipeline.get(TIMEOUT_HANDLER); timeoutHandler.setEnabledTimeout(false); String user = userRsrc.get(jobId); - + if (keepAliveParam || connectionKeepAliveEnabled){ Review comment: good catch :) this is an epic workaround that I haven't been able to figure out, here are some details: https://issues.apache.org/jira/browse/TEZ-4157?focusedCommentId=17100835=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17100835 (btw: with netty3, we didn't need this) are you fine with a comment explaining this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581682) Time Spent: 1h 40m (was: 1.5h) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581680 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:36 Start Date: 13/Apr/21 10:36 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612329298 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java ## @@ -71,15 +72,39 @@ public long transferTo(WritableByteChannel target, long position) throws IOException { if (manageOsCache && readaheadPool != null) { readaheadRequest = readaheadPool.readaheadStream(identifier, fd, - getPosition() + position, readaheadLength, - getPosition() + getCount(), readaheadRequest); + position() + position, readaheadLength, + position() + count(), readaheadRequest); } - +long written = 0; Review comment: Got it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581680) Time Spent: 1.5h (was: 1h 20m) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581679 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:34 Start Date: 13/Apr/21 10:34 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612328280 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java ## @@ -71,15 +72,39 @@ public long transferTo(WritableByteChannel target, long position) throws IOException { if (manageOsCache && readaheadPool != null) { readaheadRequest = readaheadPool.readaheadStream(identifier, fd, - getPosition() + position, readaheadLength, - getPosition() + getCount(), readaheadRequest); + position() + position, readaheadLength, + position() + count(), readaheadRequest); } - +long written = 0; Review comment: looks better, but I don't think it's correct: in case of an exception during the transfer, we should not have set transferred=true -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581679) Time Spent: 1h 20m (was: 1h 10m) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581677 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:31 Start Date: 13/Apr/21 10:31 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612326445 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java ## @@ -124,39 +149,33 @@ long customShuffleTransfer(WritableByteChannel target, long position) position += trans; trans = 0; } - + //write data to the target while(byteBuffer.hasRemaining()) { target.write(byteBuffer); } byteBuffer.clear(); } - + return actualCount - trans; } - - @Override - public void releaseExternalResources() { -if (readaheadRequest != null) { - readaheadRequest.cancel(); -} -super.releaseExternalResources(); - } - /** * Call when the transfer completes successfully so we can advise the OS that * we don't need the region to be cached anymore. */ public void transferSuccessful() { -if (manageOsCache && getCount() > 0) { +if (manageOsCache && count() > 0) { try { if (canEvictAfterTransfer) { - LOG.debug("shuffleBufferSize: {}, path: {}", shuffleBufferSize, identifier); - NativeIO.POSIX.getCacheManipulator().posixFadviseIfPossible(identifier, - fd, getPosition(), getCount(), - NativeIO.POSIX.POSIX_FADV_DONTNEED); + if (fd.valid()) { Review comment: hm, thought this over again, fd.valid() change was needed while I haven't been handling deallocate() stuff properly, but now, at this point fd should be valid...initially I left this check here because I thought that an invalid fd is not a problem (which is true, we won't advise to OS cache, and that's it), but as we already have try/catch, we don't need this this check (we'll have the exception in the logs anyway) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581677) Time Spent: 1h 10m (was: 1h) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581674 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:27 Start Date: 13/Apr/21 10:27 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612323669 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -339,27 +350,60 @@ private ShuffleHandler(Configuration conf) { public void start() throws Exception { -ServerBootstrap bootstrap = new ServerBootstrap(selector); -// Timer is shared across entire factory and must be released separately -timer = new HashedWheelTimer(); -try { - pipelineFact = new HttpPipelineFactory(conf, timer); -} catch (Exception ex) { - throw new RuntimeException(ex); -} -bootstrap.setPipelineFactory(pipelineFact); -bootstrap.setOption("backlog", NetUtil.SOMAXCONN); +ServerBootstrap bootstrap = new ServerBootstrap() +.channel(NioServerSocketChannel.class) +.group(bossGroup, workerGroup) +.localAddress(port) +.option(ChannelOption.SO_BACKLOG, NetUtil.SOMAXCONN) +.childOption(ChannelOption.SO_KEEPALIVE, true); +initPipeline(bootstrap, conf); + port = conf.getInt(SHUFFLE_PORT_CONFIG_KEY, DEFAULT_SHUFFLE_PORT); -Channel ch = bootstrap.bind(new InetSocketAddress(port)); +Channel ch = bootstrap.bind().sync().channel(); accepted.add(ch); -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); conf.set(SHUFFLE_PORT_CONFIG_KEY, Integer.toString(port)); -pipelineFact.SHUFFLE.setPort(port); +SHUFFLE.setPort(port); if (dirWatcher != null) { dirWatcher.start(); } -LOG.info("LlapShuffleHandler" + " listening on port " + port + " (SOMAXCONN: " + bootstrap.getOption("backlog") - + ")"); +LOG.info("LlapShuffleHandler listening on port {} (SOMAXCONN: {})", port, NetUtil.SOMAXCONN); + } + + private void initPipeline(ServerBootstrap bootstrap, Configuration conf) throws Exception { +SHUFFLE = getShuffle(conf); +// TODO Setup SSL Shuffle Review comment: I think we don't support SSL shuffle for LLAP at the moment (+ the comment is quite old), e.g. Cloudera's data warehouse, ssl on shuffle is handled transparently by the environment I haven't touched this part in this patch, and not even sure what's the plan :) that's why I simply kept this as is -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581674) Time Spent: 1h (was: 50m) > LLAP ShuffleHandler: upgrade to netty4 > -- > > Key: HIVE-24524 > URL: https://issues.apache.org/jira/browse/HIVE-24524 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Tez already has a WIP patch for upgrading its shuffle handler to netty4. > Netty4 is told to be a possible performance improvement compared to Netty3. > However, the refactor is not trivial, TEZ-4157 covers that more or less (the > code bases are very similar). > Background: > netty4 migration guideline: > https://netty.io/wiki/new-and-noteworthy-in-4.0.html > articles of possible performance improvement: > https://blog.twitter.com/engineering/en_us/a/2013/netty-4-at-twitter-reduced-gc-overhead.html > https://developer.squareup.com/blog/upgrading-a-reverse-proxy-from-netty-3-to-4/ > some other notes: Netty3 is EOL since 2016: > https://netty.io/news/2016/06/29/3-10-6-Final.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24524) LLAP ShuffleHandler: upgrade to netty4
[ https://issues.apache.org/jira/browse/HIVE-24524?focusedWorklogId=581667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581667 ] ASF GitHub Bot logged work on HIVE-24524: - Author: ASF GitHub Bot Created on: 13/Apr/21 10:11 Start Date: 13/Apr/21 10:11 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1778: URL: https://github.com/apache/hive/pull/1778#discussion_r612289762 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java ## @@ -71,15 +72,39 @@ public long transferTo(WritableByteChannel target, long position) throws IOException { if (manageOsCache && readaheadPool != null) { readaheadRequest = readaheadPool.readaheadStream(identifier, fd, - getPosition() + position, readaheadLength, - getPosition() + getCount(), readaheadRequest); + position() + position, readaheadLength, + position() + count(), readaheadRequest); } - +long written = 0; Review comment: Shall we simplify this to: ``` transferred = true; if (this.shuffleTransferToAllowed) { return super.transferTo(target, position); } return customShuffleTransfer(target, position); ``` ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -339,27 +350,60 @@ private ShuffleHandler(Configuration conf) { public void start() throws Exception { -ServerBootstrap bootstrap = new ServerBootstrap(selector); -// Timer is shared across entire factory and must be released separately -timer = new HashedWheelTimer(); -try { - pipelineFact = new HttpPipelineFactory(conf, timer); -} catch (Exception ex) { - throw new RuntimeException(ex); -} -bootstrap.setPipelineFactory(pipelineFact); -bootstrap.setOption("backlog", NetUtil.SOMAXCONN); +ServerBootstrap bootstrap = new ServerBootstrap() +.channel(NioServerSocketChannel.class) +.group(bossGroup, workerGroup) +.localAddress(port) +.option(ChannelOption.SO_BACKLOG, NetUtil.SOMAXCONN) +.childOption(ChannelOption.SO_KEEPALIVE, true); +initPipeline(bootstrap, conf); + port = conf.getInt(SHUFFLE_PORT_CONFIG_KEY, DEFAULT_SHUFFLE_PORT); -Channel ch = bootstrap.bind(new InetSocketAddress(port)); +Channel ch = bootstrap.bind().sync().channel(); accepted.add(ch); -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); conf.set(SHUFFLE_PORT_CONFIG_KEY, Integer.toString(port)); -pipelineFact.SHUFFLE.setPort(port); +SHUFFLE.setPort(port); if (dirWatcher != null) { dirWatcher.start(); } -LOG.info("LlapShuffleHandler" + " listening on port " + port + " (SOMAXCONN: " + bootstrap.getOption("backlog") - + ")"); +LOG.info("LlapShuffleHandler listening on port {} (SOMAXCONN: {})", port, NetUtil.SOMAXCONN); + } + + private void initPipeline(ServerBootstrap bootstrap, Configuration conf) throws Exception { +SHUFFLE = getShuffle(conf); +// TODO Setup SSL Shuffle Review comment: I know this is copy pasted from below but do we have a ticket for this? Is it still needed? ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java ## @@ -1031,25 +1038,14 @@ protected ChannelFuture sendMapOutput(ChannelHandlerContext ctx, Channel ch, info.getStartOffset(), info.getPartLength(), manageOsCache, readaheadLength, readaheadPool, spillfile.getAbsolutePath(), shuffleBufferSize, shuffleTransferToAllowed, canEvictAfterTransfer); -writeFuture = ch.write(partition); -writeFuture.addListener(new ChannelFutureListener() { -// TODO error handling; distinguish IO/connection failures, -// attribute to appropriate spill output - @Override - public void operationComplete(ChannelFuture future) { -if (future.isSuccess()) { - partition.transferSuccessful(); -} -partition.releaseExternalResources(); - } -}); +writeFuture = ch.writeAndFlush(partition); Review comment: This looks much cleaner with deallocate() call replacing completion Listeners ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/FadvisedFileRegion.java ## @@ -124,39 +149,33 @@ long customShuffleTransfer(WritableByteChannel target, long position) position += trans; trans = 0; } - + //write data to the target while(byteBuffer.hasRemaining()) { target.write(byteBuffer); }
[jira] [Work logged] (HIVE-24974) Create new metrics about the number of delta files in the ACID table
[ https://issues.apache.org/jira/browse/HIVE-24974?focusedWorklogId=581657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581657 ] ASF GitHub Bot logged work on HIVE-24974: - Author: ASF GitHub Bot Created on: 13/Apr/21 09:43 Start Date: 13/Apr/21 09:43 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2148: URL: https://github.com/apache/hive/pull/2148#discussion_r612295409 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.metrics; + +import com.google.common.cache.Cache; +import com.google.common.cache.CacheBuilder; + +import com.google.common.cache.RemovalNotification; +import com.google.common.collect.Maps; +import com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.apache.commons.lang3.tuple.Pair; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.common.metrics.common.Metrics; +import org.apache.hadoop.hive.common.metrics.common.MetricsFactory; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.io.AcidDirectory; +import org.apache.hadoop.hive.ql.io.AcidUtils; + +import org.apache.tez.common.counters.TezCounter; +import org.apache.tez.common.counters.TezCounters; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.Comparator; + +import java.util.Queue; +import java.util.TreeMap; +import java.util.concurrent.Executors; +import java.util.concurrent.BlockingQueue; +import java.util.concurrent.PriorityBlockingQueue; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ThreadFactory; +import java.util.concurrent.TimeUnit; + +/** + * Collects and publishes ACID compaction related metrics. + */ +public class DeltaFilesMetricReporter { + + private static final Logger LOG = LoggerFactory.getLogger(AcidUtils.class); + + public static final String NUM_OBSOLETE_DELTAS = "HIVE_ACID_NUM_OBSOLETE_DELTAS"; Review comment: I think also part of the plan was a 3rd metric : Number of deltas where the size is less than x % of the base (small deltas)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581657) Time Spent: 0.5h (was: 20m) > Create new metrics about the number of delta files in the ACID table > > > Key: HIVE-24974 > URL: https://issues.apache.org/jira/browse/HIVE-24974 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > 2 metrics should be collected by each table/partition that exceeds some limit. > * Number of used deltas > * Number of obsolete deltas > Both of them should be collected in AcidUtils.getAcidstate call, and only be > published if they reached a configurable threshold (to not pollute metrics) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24985) Create new metrics about locks
[ https://issues.apache.org/jira/browse/HIVE-24985?focusedWorklogId=581655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581655 ] ASF GitHub Bot logged work on HIVE-24985: - Author: ASF GitHub Bot Created on: 13/Apr/21 09:38 Start Date: 13/Apr/21 09:38 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2158: URL: https://github.com/apache/hive/pull/2158#discussion_r612292081 ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java ## @@ -415,60 +415,64 @@ public void testDBMetrics() throws Exception { String dbName = "default"; String tblName = "dcamc"; Table t = newTable(dbName, tblName, false); -burnThroughTransactions(t.getDbName(), t.getTableName(), 24); -// create and commit txn with non-empty txn_components +long start = System.currentTimeMillis() - 1000L; Review comment: copy -> fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581655) Time Spent: 50m (was: 40m) > Create new metrics about locks > -- > > Key: HIVE-24985 > URL: https://issues.apache.org/jira/browse/HIVE-24985 > Project: Hive > Issue Type: Sub-task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Basic metrics that can help investigate. > Ideas: > * number of locks > * oldest lock's age -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection
[ https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581647 ] ASF GitHub Bot logged work on HIVE-24981: - Author: ASF GitHub Bot Created on: 13/Apr/21 09:07 Start Date: 13/Apr/21 09:07 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2168: URL: https://github.com/apache/hive/pull/2168#discussion_r612270073 ## File path: ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigrationControlConfig.java ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.util; + +import java.util.List; +import java.util.Map; +import java.util.TreeMap; + +public class HiveStrictManagedMigrationControlConfig { + + private Map> databaseIncludeLists = new TreeMap>(); + + public Map> getDatabaseIncludeLists() { +return databaseIncludeLists; + } + + public void setDatabaseIncludeLists(Map> databaseIncludeLists) { +this.databaseIncludeLists = databaseIncludeLists; + } + + public void putAllFromConfig(HiveStrictManagedMigrationControlConfig other) { +for (String db : other.getDatabaseIncludeLists().keySet()) { + List theseTables = this.databaseIncludeLists.get(db); + List otherTables = other.getDatabaseIncludeLists().get(db); + if (theseTables == null) { +this.databaseIncludeLists.put(db, otherTables); Review comment: Nevemind -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581647) Time Spent: 40m (was: 0.5h) > Add control file option to HiveStrictManagedMigration for DB/table selection > > > Key: HIVE-24981 > URL: https://issues.apache.org/jira/browse/HIVE-24981 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently HiveStrictManagedMigration supports db regex and table regex > options that allow the user to specify what Hive entities it should deal > with. In cases where we have thousands of tables across thousands of DBs > iterating through everything takes a lot of time, while specifying a set of > tables/DBs with regexes is cumbersome. > We should make it available for users to prepare control files with the lists > of required items to migrate and feed this to the tool. A directory path > pointing to these control files would be taken as a new option for HSMM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline
[ https://issues.apache.org/jira/browse/HIVE-25004?focusedWorklogId=581644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581644 ] ASF GitHub Bot logged work on HIVE-25004: - Author: ASF GitHub Bot Created on: 13/Apr/21 09:06 Start Date: 13/Apr/21 09:06 Worklog Time Spent: 10m Work Description: zeroflag commented on pull request #2170: URL: https://github.com/apache/hive/pull/2170#issuecomment-818578373 cc: @mustafaiman -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581644) Time Spent: 20m (was: 10m) > HPL/SQL subsequent statements are failing after typing a malformed input in > beeline > --- > > Key: HIVE-25004 > URL: https://issues.apache.org/jira/browse/HIVE-25004 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > An error signal is stuck after evaluating the first expression. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection
[ https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581641 ] ASF GitHub Bot logged work on HIVE-24981: - Author: ASF GitHub Bot Created on: 13/Apr/21 09:06 Start Date: 13/Apr/21 09:06 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2168: URL: https://github.com/apache/hive/pull/2168#discussion_r612268649 ## File path: ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigrationControlConfig.java ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.util; + +import java.util.List; +import java.util.Map; +import java.util.TreeMap; + +public class HiveStrictManagedMigrationControlConfig { + + private Map> databaseIncludeLists = new TreeMap>(); + + public Map> getDatabaseIncludeLists() { +return databaseIncludeLists; + } + + public void setDatabaseIncludeLists(Map> databaseIncludeLists) { +this.databaseIncludeLists = databaseIncludeLists; + } + + public void putAllFromConfig(HiveStrictManagedMigrationControlConfig other) { +for (String db : other.getDatabaseIncludeLists().keySet()) { + List theseTables = this.databaseIncludeLists.get(db); + List otherTables = other.getDatabaseIncludeLists().get(db); + if (theseTables == null) { +this.databaseIncludeLists.put(db, otherTables); Review comment: Maybe: putIfAbsent? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581641) Time Spent: 0.5h (was: 20m) > Add control file option to HiveStrictManagedMigration for DB/table selection > > > Key: HIVE-24981 > URL: https://issues.apache.org/jira/browse/HIVE-24981 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently HiveStrictManagedMigration supports db regex and table regex > options that allow the user to specify what Hive entities it should deal > with. In cases where we have thousands of tables across thousands of DBs > iterating through everything takes a lot of time, while specifying a set of > tables/DBs with regexes is cumbersome. > We should make it available for users to prepare control files with the lists > of required items to migrate and feed this to the tool. A directory path > pointing to these control files would be taken as a new option for HSMM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode
[ https://issues.apache.org/jira/browse/HIVE-24997?focusedWorklogId=581642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581642 ] ASF GitHub Bot logged work on HIVE-24997: - Author: ASF GitHub Bot Created on: 13/Apr/21 09:06 Start Date: 13/Apr/21 09:06 Worklog Time Spent: 10m Work Description: zeroflag commented on pull request #2166: URL: https://github.com/apache/hive/pull/2166#issuecomment-818578215 cc: @mustafaiman -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581642) Time Spent: 20m (was: 10m) > HPL/SQL udf doesn't work in tez container mode > -- > > Key: HIVE-24997 > URL: https://issues.apache.org/jira/browse/HIVE-24997 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Since HIVE-24230 it assumes the UDF is evaluated on HS2 which is not true > in general. The SessionState is only available at compile time evaluation but > later on a new interpreter should be instantiated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24981) Add control file option to HiveStrictManagedMigration for DB/table selection
[ https://issues.apache.org/jira/browse/HIVE-24981?focusedWorklogId=581640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581640 ] ASF GitHub Bot logged work on HIVE-24981: - Author: ASF GitHub Bot Created on: 13/Apr/21 09:02 Start Date: 13/Apr/21 09:02 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2168: URL: https://github.com/apache/hive/pull/2168#discussion_r612266398 ## File path: ql/src/test/resources/hsmm/hsmm_cfg_01.yaml ## @@ -0,0 +1,9 @@ +databaseIncludeLists: Review comment: It took me some time to understand that `databaseIncludeLists` controls the tables too. Maybe `migrationLists`? or something like this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581640) Time Spent: 20m (was: 10m) > Add control file option to HiveStrictManagedMigration for DB/table selection > > > Key: HIVE-24981 > URL: https://issues.apache.org/jira/browse/HIVE-24981 > Project: Hive > Issue Type: Improvement >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently HiveStrictManagedMigration supports db regex and table regex > options that allow the user to specify what Hive entities it should deal > with. In cases where we have thousands of tables across thousands of DBs > iterating through everything takes a lot of time, while specifying a set of > tables/DBs with regexes is cumbersome. > We should make it available for users to prepare control files with the lists > of required items to migrate and feed this to the tool. A directory path > pointing to these control files would be taken as a new option for HSMM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM
[ https://issues.apache.org/jira/browse/HIVE-25006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod reassigned HIVE-25006: - > Commit Iceberg writes in HiveMetaHook instead of TezAM > -- > > Key: HIVE-25006 > URL: https://issues.apache.org/jira/browse/HIVE-25006 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. > This will enable us to implement insert overwrites for iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)