[jira] [Work logged] (HIVE-24791) Backward compatibility issue in _dumpmetadata
[ https://issues.apache.org/jira/browse/HIVE-24791?focusedWorklogId=554088=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554088 ] ASF GitHub Bot logged work on HIVE-24791: - Author: ASF GitHub Bot Created on: 18/Feb/21 06:27 Start Date: 18/Feb/21 06:27 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1988: URL: https://github.com/apache/hive/pull/1988#discussion_r578161861 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/DumpMetaData.java ## @@ -131,7 +131,8 @@ private void loadDumpFromFile() throws SemanticException { lineContents[2].equals(Utilities.nullStringOutput) ? null : Long.valueOf(lineContents[2]), lineContents[3].equals(Utilities.nullStringOutput) ? null : new Path(lineContents[3]), lineContents[4].equals(Utilities.nullStringOutput) ? null : Long.valueOf(lineContents[4]), - Boolean.valueOf(lineContents[6])); + (lineContents.length < 8 || lineContents[6].equals(Utilities.nullStringOutput)) ? Review comment: Add a test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554088) Time Spent: 0.5h (was: 20m) > Backward compatibility issue in _dumpmetadata > - > > Key: HIVE-24791 > URL: https://issues.apache.org/jira/browse/HIVE-24791 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24791.01.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24791) Backward compatibility issue in _dumpmetadata
[ https://issues.apache.org/jira/browse/HIVE-24791?focusedWorklogId=554087=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554087 ] ASF GitHub Bot logged work on HIVE-24791: - Author: ASF GitHub Bot Created on: 18/Feb/21 06:27 Start Date: 18/Feb/21 06:27 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1988: URL: https://github.com/apache/hive/pull/1988#discussion_r578161679 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/DumpMetaData.java ## @@ -131,7 +131,8 @@ private void loadDumpFromFile() throws SemanticException { lineContents[2].equals(Utilities.nullStringOutput) ? null : Long.valueOf(lineContents[2]), lineContents[3].equals(Utilities.nullStringOutput) ? null : new Path(lineContents[3]), lineContents[4].equals(Utilities.nullStringOutput) ? null : Long.valueOf(lineContents[4]), - Boolean.valueOf(lineContents[6])); + (lineContents.length < 8 || lineContents[6].equals(Utilities.nullStringOutput)) ? Review comment: check for length < 7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554087) Time Spent: 20m (was: 10m) > Backward compatibility issue in _dumpmetadata > - > > Key: HIVE-24791 > URL: https://issues.apache.org/jira/browse/HIVE-24791 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24791.01.patch > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24751) Workload Manager sees `No privilege` exception even when authorization is not enabled
[ https://issues.apache.org/jira/browse/HIVE-24751?focusedWorklogId=554058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554058 ] ASF GitHub Bot logged work on HIVE-24751: - Author: ASF GitHub Bot Created on: 18/Feb/21 04:44 Start Date: 18/Feb/21 04:44 Worklog Time Spent: 10m Work Description: guptanikhil007 commented on pull request #1964: URL: https://github.com/apache/hive/pull/1964#issuecomment-781045969 @pvargacl, @szlta, @sankarh Can you please review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554058) Time Spent: 0.5h (was: 20m) > Workload Manager sees `No privilege` exception even when authorization is not > enabled > - > > Key: HIVE-24751 > URL: https://issues.apache.org/jira/browse/HIVE-24751 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > At present it is not checked whether authorization is enabled or not for Kill > Query access. > This causes Workload Manager thread to end up with No privilege Exception > when trying to kill a query in an environment where authorization is disabled. > {code:java} > org.apache.hadoop.hive.ql.metadata.HiveException: No privilege > at > org.apache.hive.service.server.KillQueryImpl.killQuery(KillQueryImpl.java:188) > at > org.apache.hadoop.hive.ql.exec.tez.WorkloadManager.lambda$scheduleWork$3(WorkloadManager.java:454) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hive.service.cli.HiveSQLException: No privilege > at > org.apache.hive.service.server.KillQueryImpl.killQuery(KillQueryImpl.java:167) > ... 6 more{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24639) Raises SemanticException other than ClassCastException when filter has non-boolean expressions
[ https://issues.apache.org/jira/browse/HIVE-24639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng resolved HIVE-24639. Resolution: Duplicate > Raises SemanticException other than ClassCastException when filter has > non-boolean expressions > -- > > Key: HIVE-24639 > URL: https://issues.apache.org/jira/browse/HIVE-24639 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Sometimes we see ClassCastException in filters when fetching some rows of a > table or executing the query. The > GenericUDFOPOr/GenericUDFOPAnd/FilterOperator assume that the output of their > conditions should be a boolean, but there is no garanteed. For example: > _select * from ccn_table where src + 1;_ > will throw ClassCastException: > {code:java} > Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Boolean > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:125) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:553) > ...{code} > We'd better to validate the filter during analyzing instead of at runtime and > bring more meaningful messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24639) Raises SemanticException other than ClassCastException when filter has non-boolean expressions
[ https://issues.apache.org/jira/browse/HIVE-24639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-24639: --- Fix Version/s: 4.0.0 > Raises SemanticException other than ClassCastException when filter has > non-boolean expressions > -- > > Key: HIVE-24639 > URL: https://issues.apache.org/jira/browse/HIVE-24639 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Sometimes we see ClassCastException in filters when fetching some rows of a > table or executing the query. The > GenericUDFOPOr/GenericUDFOPAnd/FilterOperator assume that the output of their > conditions should be a boolean, but there is no garanteed. For example: > _select * from ccn_table where src + 1;_ > will throw ClassCastException: > {code:java} > Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to > java.lang.Boolean > at > org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:125) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:553) > ...{code} > We'd better to validate the filter during analyzing instead of at runtime and > bring more meaningful messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24516) Txnhandler onrename might ignore exceptions
[ https://issues.apache.org/jira/browse/HIVE-24516?focusedWorklogId=554007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554007 ] ASF GitHub Bot logged work on HIVE-24516: - Author: ASF GitHub Bot Created on: 18/Feb/21 00:52 Start Date: 18/Feb/21 00:52 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1762: URL: https://github.com/apache/hive/pull/1762 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554007) Time Spent: 0.5h (was: 20m) > Txnhandler onrename might ignore exceptions > --- > > Key: HIVE-24516 > URL: https://issues.apache.org/jira/browse/HIVE-24516 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This is a followup on HIVE-24193. Table not exists errors shouldn't be > ignored in the first place. > {code} > } catch (SQLException e) { > LOG.debug("Going to rollback: " + callSig); > rollbackDBConn(dbConn); > checkRetryable(dbConn, e, callSig); > if (e.getMessage().contains("does not exist")) { > LOG.warn("Cannot perform " + callSig + " since metastore table does > not exist"); > } else { > throw new MetaException("Unable to " + callSig + ":" + > StringUtils.stringifyException(e)); > } > } > {code} > This error handling might have been put there for backard compatibility for > missing acid metadata tables, but this is not needed anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost
[ https://issues.apache.org/jira/browse/HIVE-24710?focusedWorklogId=554005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554005 ] ASF GitHub Bot logged work on HIVE-24710: - Author: ASF GitHub Bot Created on: 18/Feb/21 00:51 Start Date: 18/Feb/21 00:51 Worklog Time Spent: 10m Work Description: rbalamohan commented on pull request #1940: URL: https://github.com/apache/hive/pull/1940#issuecomment-780953443 Thanks for the review @ashutoshc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554005) Time Spent: 40m (was: 0.5h) > Optimise PTF iteration for count(*) to reduce CPU and IO cost > - > > Key: HIVE-24710 > URL: https://issues.apache.org/jira/browse/HIVE-24710 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > E.g query > {noformat} > select x, y, count(*) over (partition by x order by y range between 86400 > PRECEDING and CURRENT ROW) r0 from foo > {noformat} > 1. In such cases, there is no need to iterate over the rowcontainers often > (internally it does O(n^2) operations taking forever when window frame is > really large). This can be optimised to reduce CPU burn and IO. > 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when > parameters are empty. This codepath can also be optimised. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost
[ https://issues.apache.org/jira/browse/HIVE-24710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned HIVE-24710: --- Assignee: Rajesh Balamohan > Optimise PTF iteration for count(*) to reduce CPU and IO cost > - > > Key: HIVE-24710 > URL: https://issues.apache.org/jira/browse/HIVE-24710 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > E.g query > {noformat} > select x, y, count(*) over (partition by x order by y range between 86400 > PRECEDING and CURRENT ROW) r0 from foo > {noformat} > 1. In such cases, there is no need to iterate over the rowcontainers often > (internally it does O(n^2) operations taking forever when window frame is > really large). This can be optimised to reduce CPU burn and IO. > 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when > parameters are empty. This codepath can also be optimised. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost
[ https://issues.apache.org/jira/browse/HIVE-24710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan resolved HIVE-24710. - Fix Version/s: 4.0.0 Resolution: Fixed Thanks for the review [~ashutoshc]. Merged the PR. > Optimise PTF iteration for count(*) to reduce CPU and IO cost > - > > Key: HIVE-24710 > URL: https://issues.apache.org/jira/browse/HIVE-24710 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > E.g query > {noformat} > select x, y, count(*) over (partition by x order by y range between 86400 > PRECEDING and CURRENT ROW) r0 from foo > {noformat} > 1. In such cases, there is no need to iterate over the rowcontainers often > (internally it does O(n^2) operations taking forever when window frame is > really large). This can be optimised to reduce CPU burn and IO. > 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when > parameters are empty. This codepath can also be optimised. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost
[ https://issues.apache.org/jira/browse/HIVE-24710?focusedWorklogId=554001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554001 ] ASF GitHub Bot logged work on HIVE-24710: - Author: ASF GitHub Bot Created on: 18/Feb/21 00:49 Start Date: 18/Feb/21 00:49 Worklog Time Spent: 10m Work Description: rbalamohan merged pull request #1940: URL: https://github.com/apache/hive/pull/1940 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554001) Time Spent: 0.5h (was: 20m) > Optimise PTF iteration for count(*) to reduce CPU and IO cost > - > > Key: HIVE-24710 > URL: https://issues.apache.org/jira/browse/HIVE-24710 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > E.g query > {noformat} > select x, y, count(*) over (partition by x order by y range between 86400 > PRECEDING and CURRENT ROW) r0 from foo > {noformat} > 1. In such cases, there is no need to iterate over the rowcontainers often > (internally it does O(n^2) operations taking forever when window frame is > really large). This can be optimised to reduce CPU burn and IO. > 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when > parameters are empty. This codepath can also be optimised. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24710) Optimise PTF iteration for count(*) to reduce CPU and IO cost
[ https://issues.apache.org/jira/browse/HIVE-24710?focusedWorklogId=553960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553960 ] ASF GitHub Bot logged work on HIVE-24710: - Author: ASF GitHub Bot Created on: 17/Feb/21 23:18 Start Date: 17/Feb/21 23:18 Worklog Time Spent: 10m Work Description: ashutoshc commented on pull request #1940: URL: https://github.com/apache/hive/pull/1940#issuecomment-780917826 +1 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553960) Time Spent: 20m (was: 10m) > Optimise PTF iteration for count(*) to reduce CPU and IO cost > - > > Key: HIVE-24710 > URL: https://issues.apache.org/jira/browse/HIVE-24710 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > E.g query > {noformat} > select x, y, count(*) over (partition by x order by y range between 86400 > PRECEDING and CURRENT ROW) r0 from foo > {noformat} > 1. In such cases, there is no need to iterate over the rowcontainers often > (internally it does O(n^2) operations taking forever when window frame is > really large). This can be optimised to reduce CPU burn and IO. > 2. BasePartitionEvaluator::calcFunctionValue need not materialize ROW when > parameters are empty. This codepath can also be optimised. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24764) insert overwrite on a partition resets row count stats in other partitions
[ https://issues.apache.org/jira/browse/HIVE-24764?focusedWorklogId=553957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553957 ] ASF GitHub Bot logged work on HIVE-24764: - Author: ASF GitHub Bot Created on: 17/Feb/21 23:07 Start Date: 17/Feb/21 23:07 Worklog Time Spent: 10m Work Description: ashutoshc commented on pull request #1967: URL: https://github.com/apache/hive/pull/1967#issuecomment-780913343 +1 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553957) Time Spent: 40m (was: 0.5h) > insert overwrite on a partition resets row count stats in other partitions > -- > > Key: HIVE-24764 > URL: https://issues.apache.org/jira/browse/HIVE-24764 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > After insert overwrite on a partition, stats on other partitions are messed > up. Subsequent queries end up with plans with PARTIAL stats. In certain > cases, this leads to suboptimal query plans. > {noformat} > drop table if exists test_stats; > drop table if exists test_stats_2; > create table test_stats(i int, j bigint); > create table test_stats_2(i int) partitioned by (j bigint); > insert into test_stats values (1, 1), (2, 2), (3, 3), (4, 4), (5, NULL); > -- select * from test_stats; > 1 1 > 2 2 > 3 3 > 4 4 > 5 > insert overwrite table test_stats_2 partition(j) select i, j from test_stats > where j is not null; > -- After executing this statement, stat gets messed up. > insert overwrite table test_stats_2 partition(j) select i, j from test_stats > where j is null; > -- select * from test_stats_2; > 1 1 > 2 2 > 3 3 > 4 4 > 5 > -- This would return "PARTIAL" stats instead of "COMPLETE" > explain select i, count(*) as c from test_stats_2 group by i order by c desc > limit 10; > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: hive_20210208093110_62ced99e-f068-42d4-9ba8-d45fccd6c0a2:68 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > DagName: hive_20210208093110_62ced99e-f068-42d4-9ba8-d45fccd6c0a2:68 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: test_stats_2 > Statistics: Num rows: 125 Data size: 500 Basic stats: > PARTIAL Column stats: COMPLETE > Select Operator > expressions: i (type: int) > outputColumnNames: i > Statistics: Num rows: 125 Data size: 500 Basic stats: > PARTIAL Column stats: COMPLETE > Group By Operator > aggregations: count() > keys: i (type: int) > minReductionHashAggr: 0.99 > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 125 Data size: 1500 Basic stats: > PARTIAL Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > null sort order: a > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 125 Data size: 1500 Basic > stats: PARTIAL Column stats: COMPLETE > value expressions: _col1 (type: bigint) > Execution mode: vectorized, llap > LLAP IO: may be used (ACID table) > Reducer 2 > Execution mode: vectorized, llap > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > keys: KEY._col0 (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 62 Data size: 744 Basic stats: PARTIAL > Column stats: COMPLETE > Top N Key Operator > sort order: - > keys: _col1 (type: bigint) > null sort order: a > Statistics: Num rows: 62 Data size: 744 Basic stats: > PARTIAL Column stats: COMPLETE >
[jira] [Work logged] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?focusedWorklogId=553949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553949 ] ASF GitHub Bot logged work on HIVE-24786: - Author: ASF GitHub Bot Created on: 17/Feb/21 22:54 Start Date: 17/Feb/21 22:54 Worklog Time Spent: 10m Work Description: t3rmin4t0r commented on a change in pull request #1983: URL: https://github.com/apache/hive/pull/1983#discussion_r577852301 ## File path: jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java ## @@ -581,21 +596,99 @@ public long getRetryInterval() { } else { httpClientBuilder = HttpClientBuilder.create(); } -// In case the server's idletimeout is set to a lower value, it might close it's side of -// connection. However we retry one more time on NoHttpResponseException + +// Beeline <--> LB <--> Reverse Proxy <-> Hiveserver2 +// In case of deployments like above, the LoadBalancer (LB) can be configured with Idle Timeout after which the LB +// will send TCP RST to Client (Beeline) and Backend (Reverse Proxy). If user is connected to beeline, idle for +// sometime and resubmits a query after the idle timeout there is a broken pipe between beeline and LB. When Beeline +// tries to submit the query one of two things happen, it either hangs or times out (if socketTimeout is defined in +// the jdbc param). The hang is because of the default infinite socket timeout for which there is no auto-recovery +// (user have to manually interrupt the query). If the socketTimeout jdbc param was specified, beeline will receive +// SocketTimeoutException (Read Timeout) or NoHttpResponseException both of which can be retried if maxRetries is +// also specified by the user (jdbc param). +// The following retry handler handles the above cases in addition to retries for idempotent and unsent requests. httpClientBuilder.setRetryHandler(new HttpRequestRetryHandler() { + // This handler is mostly a copy of DefaultHttpRequestRetryHandler except it also retries some exceptions + // which could be thrown in certain cases where idle timeout from intermediate proxy triggers a connection reset. + private final List> nonRetriableClasses = Arrays.asList( + InterruptedIOException.class, + UnknownHostException.class, + ConnectException.class, + SSLException.class); + // socket exceptions could happen because of timeout, broken pipe or server not responding in which case it is + // better to reopen the connection and retry if user specified maxRetries + private final List> retriableClasses = Arrays.asList( + SocketTimeoutException.class, + SocketException.class, + NoHttpResponseException.class + ); + @Override public boolean retryRequest(IOException exception, int executionCount, HttpContext context) { -if (executionCount > 1) { - LOG.info("Retry attempts to connect to server exceeded."); +Args.notNull(exception, "Exception parameter"); +Args.notNull(context, "HTTP context"); +if (executionCount > maxRetries) { + // Do not retry if over max retry count + LOG.error("Max retries (" + maxRetries + ") exhausted.", exception); + return false; +} +if (this.retriableClasses.contains(exception.getClass())) { + LOG.info("Retrying " + exception.getClass() + " as it is in retriable classes list."); + return true; +} +if (this.nonRetriableClasses.contains(exception.getClass())) { + LOG.info("Not retrying as the class (" + exception.getClass() + ") is non-retriable class."); + return false; +} else { + for (final Class rejectException : this.nonRetriableClasses) { +if (rejectException.isInstance(exception)) { + LOG.info("Not retrying as the class (" + exception.getClass() + ") is an instance of is non-retriable class.");; + return false; +} + } +} +final HttpClientContext clientContext = HttpClientContext.adapt(context); +final HttpRequest request = clientContext.getRequest(); + +if(requestIsAborted(request)){ + LOG.info("Not retrying as request is aborted."); return false; } -if (exception instanceof org.apache.http.NoHttpResponseException) { - LOG.info("Could not connect to the server. Retrying one more time."); + +if (handleAsIdempotent(request)) { + LOG.info("Retrying idempotent request. Attempt " + executionCount + " of " + maxRetries); + // Retry if the request is considered idempotent + return true; +} + +if (!clientContext.isRequestSent())
[jira] [Work logged] (HIVE-24778) Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety properties
[ https://issues.apache.org/jira/browse/HIVE-24778?focusedWorklogId=553948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553948 ] ASF GitHub Bot logged work on HIVE-24778: - Author: ASF GitHub Bot Created on: 17/Feb/21 22:52 Start Date: 17/Feb/21 22:52 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #1982: URL: https://github.com/apache/hive/pull/1982#discussion_r578001413 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java ## @@ -45,7 +45,7 @@ public TimestampCastRestrictorResolver(UDFMethodResolver parentResolver) { this.parentResolver = parentResolver; SessionState ss = SessionState.get(); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { +if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) { Review comment: This means that for all the other kind of strict checks we are vulnerable to conversions happening during init right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553948) Time Spent: 40m (was: 0.5h) > Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety > properties > > > Key: HIVE-24778 > URL: https://issues.apache.org/jira/browse/HIVE-24778 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The majority of strict type checks can be controlled by > {{hive.strict.checks.type.safety}} property. HIVE-24157 introduced another > property, namely {{hive.strict.timestamp.conversion}}, to control the > implicit comparisons between numerics and timestamps. > The name and description of {{hive.strict.checks.type.safety}} imply that the > property covers all strict checks so having others for specific cases appears > confusing and can easily lead to unexpected behavior. > The goal of this issue is to unify those properties to facilitate > configuration and improve code reuse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24693) Convert timestamps to zoned times without string operations
[ https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=553894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553894 ] ASF GitHub Bot logged work on HIVE-24693: - Author: ASF GitHub Bot Created on: 17/Feb/21 21:34 Start Date: 17/Feb/21 21:34 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1938: URL: https://github.com/apache/hive/pull/1938 Replaces #1918 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553894) Time Spent: 6h (was: 5h 50m) > Convert timestamps to zoned times without string operations > --- > > Key: HIVE-24693 > URL: https://issues.apache.org/jira/browse/HIVE-24693 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > > Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a > timestamp object into a binary value. The way in which it does this,... it > calls {{toString()}} on the timestamp object, and then parses the String. > This particular timestamp do not carry a timezone, so the string is something > like: > {{2021-21-03 12:32:23....}} > The parse code tries to parse the string assuming there is a time zone, and > if not, falls-back and applies the provided "default time zone". As was > noted in [HIVE-24353], if something fails to parse, it is very expensive to > try to parse again. So, for each timestamp in the Parquet file, it: > * Builds a string from the time stamp > * Parses it (throws an exception, parses again) > There is no need to do this kind of string manipulations/parsing, it should > just be using the epoch millis/seconds/time stored internal to the Timestamp > object. > {code:java} > // Converts Timestamp to TimestampTZ. > public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) { > return parse(ts.toString(), defaultTimeZone); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24693) Convert timestamps to zoned times without string operations
[ https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=553886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553886 ] ASF GitHub Bot logged work on HIVE-24693: - Author: ASF GitHub Bot Created on: 17/Feb/21 21:08 Start Date: 17/Feb/21 21:08 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1938: URL: https://github.com/apache/hive/pull/1938 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553886) Time Spent: 5h 50m (was: 5h 40m) > Convert timestamps to zoned times without string operations > --- > > Key: HIVE-24693 > URL: https://issues.apache.org/jira/browse/HIVE-24693 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 5h 50m > Remaining Estimate: 0h > > Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a > timestamp object into a binary value. The way in which it does this,... it > calls {{toString()}} on the timestamp object, and then parses the String. > This particular timestamp do not carry a timezone, so the string is something > like: > {{2021-21-03 12:32:23....}} > The parse code tries to parse the string assuming there is a time zone, and > if not, falls-back and applies the provided "default time zone". As was > noted in [HIVE-24353], if something fails to parse, it is very expensive to > try to parse again. So, for each timestamp in the Parquet file, it: > * Builds a string from the time stamp > * Parses it (throws an exception, parses again) > There is no need to do this kind of string manipulations/parsing, it should > just be using the epoch millis/seconds/time stored internal to the Timestamp > object. > {code:java} > // Converts Timestamp to TimestampTZ. > public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) { > return parse(ts.toString(), defaultTimeZone); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24693) Convert timestamps to zoned times without string operations
[ https://issues.apache.org/jira/browse/HIVE-24693?focusedWorklogId=553885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553885 ] ASF GitHub Bot logged work on HIVE-24693: - Author: ASF GitHub Bot Created on: 17/Feb/21 21:08 Start Date: 17/Feb/21 21:08 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1938: URL: https://github.com/apache/hive/pull/1938#issuecomment-780854241 ` The build of this commit was aborted` Le sigh This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553885) Time Spent: 5h 40m (was: 5.5h) > Convert timestamps to zoned times without string operations > --- > > Key: HIVE-24693 > URL: https://issues.apache.org/jira/browse/HIVE-24693 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Critical > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > > Parquet {{DataWriteableWriter}} relias on {{NanoTimeUtils}} to convert a > timestamp object into a binary value. The way in which it does this,... it > calls {{toString()}} on the timestamp object, and then parses the String. > This particular timestamp do not carry a timezone, so the string is something > like: > {{2021-21-03 12:32:23....}} > The parse code tries to parse the string assuming there is a time zone, and > if not, falls-back and applies the provided "default time zone". As was > noted in [HIVE-24353], if something fails to parse, it is very expensive to > try to parse again. So, for each timestamp in the Parquet file, it: > * Builds a string from the time stamp > * Parses it (throws an exception, parses again) > There is no need to do this kind of string manipulations/parsing, it should > just be using the epoch millis/seconds/time stored internal to the Timestamp > object. > {code:java} > // Converts Timestamp to TimestampTZ. > public static TimestampTZ convert(Timestamp ts, ZoneId defaultTimeZone) { > return parse(ts.toString(), defaultTimeZone); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2
[ https://issues.apache.org/jira/browse/HIVE-24743?focusedWorklogId=553835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553835 ] ASF GitHub Bot logged work on HIVE-24743: - Author: ASF GitHub Bot Created on: 17/Feb/21 19:31 Start Date: 17/Feb/21 19:31 Worklog Time Spent: 10m Work Description: kishendas commented on a change in pull request #1956: URL: https://github.com/apache/hive/pull/1956#discussion_r577886973 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -3195,17 +3196,24 @@ public void alterPartitions(String dbName, String tableName, public List getPartitionsByNames(String dbName, String tableName, List partitionNames) throws HiveException { try { - return getMSC().getPartitionsByNames(dbName, tableName, partitionNames); + GetPartitionsByNamesRequest req = new GetPartitionsByNamesRequest(); + req.setDb_name(dbName); + req.setTbl_name(tableName); + req.setNames(partitionNames); + return getPartitionsByNames(req, null); } catch (Exception e) { LOG.error("Failed getPartitionsByNames", e); throw new HiveException(e); } } -public List getPartitionsByNames(GetPartitionsByNamesRequest req) +public List getPartitionsByNames(GetPartitionsByNamesRequest req, + Table table) throws HiveException { try { - Table table = getTable(req.getDb_name(), req.getTbl_name()); + if( table == null ) { +table = getTable(req.getDb_name(), req.getTbl_name()); Review comment: Ok, removed the extra getTable call now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553835) Time Spent: 1.5h (was: 1h 20m) > [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2 > -- > > Key: HIVE-24743 > URL: https://issues.apache.org/jira/browse/HIVE-24743 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > As part of ( HIVE-23821: Send tableId in request for all the new HMS > get_partition APIs ) we added logic to send tableId in the request for > several get_partition APIs, but looks like it was missed out for > getPartitionsByNames. TableId and validWriteIdList are used to maintain > consistency, when HMS API response is being served from a remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24788) Backport HIVE-23338 to branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24788?focusedWorklogId=553834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553834 ] ASF GitHub Bot logged work on HIVE-24788: - Author: ASF GitHub Bot Created on: 17/Feb/21 19:29 Start Date: 17/Feb/21 19:29 Worklog Time Spent: 10m Work Description: h-vetinari commented on pull request #1986: URL: https://github.com/apache/hive/pull/1986#issuecomment-780796536 This is my first time contributing to hive. I was surprised that another CI run appeared 24h later. It fails, but says ``` There are 0 new tests failing, 26 existing failing and 53 skipped. ``` That sounds like the failures are pre-existing? Anything I need to do here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553834) Time Spent: 0.5h (was: 20m) > Backport HIVE-23338 to branch-3.1 > - > > Key: HIVE-24788 > URL: https://issues.apache.org/jira/browse/HIVE-24788 > Project: Hive > Issue Type: Task >Reporter: H. Vetinari >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > jackson has a whole bunch of CVEs open against 2.9.x, which makes working > with HIVE in security aware environments quite difficult. > This has been fixed in HIVE-23338 already, but since 4.0.0 hasn't been > released yet (and is not on the horizon, as far as I can tell), this should > be backported to {{branch-3.1}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24733) Handle replication when db location and managed location is set to custom location on source
[ https://issues.apache.org/jira/browse/HIVE-24733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286095#comment-17286095 ] Aasha Medhi commented on HIVE-24733: +1 > Handle replication when db location and managed location is set to custom > location on source > > > Key: HIVE-24733 > URL: https://issues.apache.org/jira/browse/HIVE-24733 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > {color:#172b4d} {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2
[ https://issues.apache.org/jira/browse/HIVE-24743?focusedWorklogId=553780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553780 ] ASF GitHub Bot logged work on HIVE-24743: - Author: ASF GitHub Bot Created on: 17/Feb/21 18:23 Start Date: 17/Feb/21 18:23 Worklog Time Spent: 10m Work Description: kishendas commented on a change in pull request #1956: URL: https://github.com/apache/hive/pull/1956#discussion_r577841886 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -3195,17 +3196,24 @@ public void alterPartitions(String dbName, String tableName, public List getPartitionsByNames(String dbName, String tableName, List partitionNames) throws HiveException { try { - return getMSC().getPartitionsByNames(dbName, tableName, partitionNames); + GetPartitionsByNamesRequest req = new GetPartitionsByNamesRequest(); + req.setDb_name(dbName); + req.setTbl_name(tableName); + req.setNames(partitionNames); + return getPartitionsByNames(req, null); } catch (Exception e) { LOG.error("Failed getPartitionsByNames", e); throw new HiveException(e); } } -public List getPartitionsByNames(GetPartitionsByNamesRequest req) +public List getPartitionsByNames(GetPartitionsByNamesRequest req, + Table table) throws HiveException { try { - Table table = getTable(req.getDb_name(), req.getTbl_name()); + if( table == null ) { +table = getTable(req.getDb_name(), req.getTbl_name()); Review comment: TableId is still required for the remote cache to decide whether it can serve the data from cache or has to refresh it from HMS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553780) Time Spent: 1h 20m (was: 1h 10m) > [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2 > -- > > Key: HIVE-24743 > URL: https://issues.apache.org/jira/browse/HIVE-24743 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > As part of ( HIVE-23821: Send tableId in request for all the new HMS > get_partition APIs ) we added logic to send tableId in the request for > several get_partition APIs, but looks like it was missed out for > getPartitionsByNames. TableId and validWriteIdList are used to maintain > consistency, when HMS API response is being served from a remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2
[ https://issues.apache.org/jira/browse/HIVE-24743?focusedWorklogId=553779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553779 ] ASF GitHub Bot logged work on HIVE-24743: - Author: ASF GitHub Bot Created on: 17/Feb/21 18:20 Start Date: 17/Feb/21 18:20 Worklog Time Spent: 10m Work Description: yongzhi commented on a change in pull request #1956: URL: https://github.com/apache/hive/pull/1956#discussion_r57783 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -3195,17 +3196,24 @@ public void alterPartitions(String dbName, String tableName, public List getPartitionsByNames(String dbName, String tableName, List partitionNames) throws HiveException { try { - return getMSC().getPartitionsByNames(dbName, tableName, partitionNames); + GetPartitionsByNamesRequest req = new GetPartitionsByNamesRequest(); + req.setDb_name(dbName); + req.setTbl_name(tableName); + req.setNames(partitionNames); + return getPartitionsByNames(req, null); } catch (Exception e) { LOG.error("Failed getPartitionsByNames", e); throw new HiveException(e); } } -public List getPartitionsByNames(GetPartitionsByNamesRequest req) +public List getPartitionsByNames(GetPartitionsByNamesRequest req, + Table table) throws HiveException { try { - Table table = getTable(req.getDb_name(), req.getTbl_name()); + if( table == null ) { +table = getTable(req.getDb_name(), req.getTbl_name()); Review comment: As if you do not use cached client, you need not table id, it will be different for the client direct use HMS apis (no need to call getTable). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553779) Time Spent: 1h 10m (was: 1h) > [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2 > -- > > Key: HIVE-24743 > URL: https://issues.apache.org/jira/browse/HIVE-24743 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > As part of ( HIVE-23821: Send tableId in request for all the new HMS > get_partition APIs ) we added logic to send tableId in the request for > several get_partition APIs, but looks like it was missed out for > getPartitionsByNames. TableId and validWriteIdList are used to maintain > consistency, when HMS API response is being served from a remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
[ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Doneriya updated HIVE-21915: --- Description: The HQL syntax is like this: CREATE TEMPORARY TABLE tez_union_all_loss_data AS SELECT xxx, yyy, zzz,1 as tag FROM ods_1 UNION ALL SELECT xxx, yyy, zzz, tag FROM ( SELECT xxx ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy ,zzz ,2 as tag FROM ods_2 LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb ) tbl ; With above HQL, we are expecting that rows with both tag = 2 and tag = 1 appear. In our case however, all the rows with tag = 1 are lost. Dig deeper we can find that the generated two maps have identical task tmp paths. And that results from when UDTF is present, the FileSinkOperator will be processed twice generating the tmp path in GenTezUtils.removeUnionOperators(); was: The HQL syntax is like this: CREATE TEMPORARY TABLE tez_union_all_loss_data AS SELECT xxx, yyy, zzz,1 as tag FROM ods_1 UNION ALL SELECT xxx, yyy, zzz, tag FROM ( SELECT xxx ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy ,zzz ,2 as tag FROM ods_2 LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb ) tbl ; With above HQL, we are expecting that rows with both tag = 2 and tag = 1 appear. In our case however, all the rows with tag = 1 are lost. Dig deeper we can find that the generated two maps have identical task tmp paths. And that results from when UDTF is present, the FileSinkOperator will be processed twice generating the tmp path in GenTezUtils.removeUnionOperators(); > Hive with TEZ UNION ALL and UDTF results in data loss > - > > Key: HIVE-21915 > URL: https://issues.apache.org/jira/browse/HIVE-21915 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1 >Reporter: Wei Zhang >Assignee: Wei Zhang >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21915.01.patch, HIVE-21915.02.patch, > HIVE-21915.03.patch, HIVE-21915.04.patch > > > The HQL syntax is like this: > CREATE TEMPORARY TABLE tez_union_all_loss_data AS > SELECT xxx, yyy, zzz,1 as tag > FROM ods_1 > UNION ALL > SELECT xxx, yyy, zzz, tag > FROM > ( > SELECT xxx > ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy > ,zzz > ,2 as tag > FROM ods_2 > LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb > ) tbl > ; > > With above HQL, we are expecting that rows with both tag = 2 and tag = 1 > appear. In our case however, all the rows with tag = 1 are lost. > Dig deeper we can find that the generated two maps have identical task tmp > paths. And that results from when UDTF is present, the FileSinkOperator will > be processed twice generating the tmp path in > GenTezUtils.removeUnionOperators(); > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24775) Incorrect null handling when rebuilding Materialized view incrementally
[ https://issues.apache.org/jira/browse/HIVE-24775?focusedWorklogId=553740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553740 ] ASF GitHub Bot logged work on HIVE-24775: - Author: ASF GitHub Bot Created on: 17/Feb/21 17:01 Start Date: 17/Feb/21 17:01 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1981: URL: https://github.com/apache/hive/pull/1981#discussion_r50512 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java ## @@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) { projExprs.add(rightRef); joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, Review comment: Instead of doing the transformation in the rewrite to AST method, let's do it here. That will be more consistent and decrease rewriting at the AST level. In particular, this should be `SqlStdOperatorTable.IS_NOT_DISTINCT_FROM`. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java ## @@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) { projExprs.add(rightRef); joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, ImmutableList.of(leftRef, rightRef))); - filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, Review comment: In this case, this filter condition should be the same one that it is introduced for the join operator (with `SqlStdOperatorTable.IS_NOT_DISTINCT_FROM`) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553740) Time Spent: 0.5h (was: 20m) > Incorrect null handling when rebuilding Materialized view incrementally > --- > > Key: HIVE-24775 > URL: https://issues.apache.org/jira/browse/HIVE-24775 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > CREATE TABLE t1 (a int, b varchar(256), c decimal(10,2), d int) STORED AS orc > TBLPROPERTIES ('transactional'='true'); > INSERT INTO t1 VALUES > (NULL, 'null_value', 100.77, 7), > (1, 'calvin', 978.76, 3), > (1, 'charlie', 9.8, 1); > CREATE MATERIALIZED VIEW mat1 TBLPROPERTIES ('transactional'='true') AS > SELECT a, b, sum(d) > FROM t1 > WHERE c > 10.0 > GROUP BY a, b; > INSERT INTO t1 VALUES > (NULL, 'null_value', 100.88, 8), > (1, 'charlie', 15.8, 1); > ALTER MATERIALIZED VIEW mat1 REBUILD; > SELECT * FROM mat1 > ORDER BY a, b; > {code} > View contains: > {code} > 1 calvin 3 > 1 charlie 1 > NULL null_value 8 > NULL null_value 7 > {code} > but it should contain: > {code} > 1 calvin 3 > 1 charlie 1 > NULL null_value 15 > {code} > Rows with aggregate key columns having NULL values are not aggregated because > incremental materialized view rebuild plan is altered by > [applyPreJoinOrderingTransforms|https://github.com/apache/hive/blob/76732ad27e139fbdef25b820a07cf35934771083/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1975]: > IS NOT NULL filter added for each of these columns on top of the view scan > when joining with the branch pulls the rows inserted after the last rebuild: > {code} > HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(AND(IS NULL($0), IS NULL($1)), $5, > +($5, $2))]) > HiveFilter(condition=[OR(AND(IS NULL($0), IS NULL($1)), AND(=($0, $3), > =($1, $4)))]) > HiveJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[right], > algorithm=[none], cost=[not available]) > HiveProject(a=[$0], b=[$1], _c2=[$2]) > HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))]) > HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1]) > HiveProject(a=[$0], b=[$1], $f2=[$2]) > HiveAggregate(group=[{0, 1}], agg#0=[sum($3)]) > HiveFilter(condition=[AND(<(1, $6.writeid), >($2, 10))]) > HiveTableScan(table=[[default, t1]], table:alias=[t1]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24775) Incorrect null handling when rebuilding Materialized view incrementally
[ https://issues.apache.org/jira/browse/HIVE-24775?focusedWorklogId=553738=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553738 ] ASF GitHub Bot logged work on HIVE-24775: - Author: ASF GitHub Bot Created on: 17/Feb/21 17:00 Start Date: 17/Feb/21 17:00 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1981: URL: https://github.com/apache/hive/pull/1981#discussion_r577759809 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java ## @@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) { projExprs.add(rightRef); joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, ImmutableList.of(leftRef, rightRef))); - filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, - ImmutableList.of(leftRef))); } // 3) Add the expressions that correspond to the aggregation // functions -RexNode caseFilterCond = RexUtil.composeConjunction(rexBuilder, filterConjs); +List filterConjs = new ArrayList<>(); for (int i = 0, leftPos = groupCount, rightPos = totalCount + groupCount; leftPos < totalCount; i++, leftPos++, rightPos++) { // case when mv2.deptno IS NULL AND mv2.deptname IS NULL then s else source.s + mv2.s end RexNode leftRef = rexBuilder.makeInputRef( joinLeftInput.getRowType().getFieldList().get(leftPos).getType(), leftPos); + filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, Review comment: This `IS_NULL` filter should not be here since we do not have any guarantees the aggregate results may or may not produce nulls? ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java ## @@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) { projExprs.add(rightRef); joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, Review comment: Instead of doing the transformation in the rewrite to AST method, let's do it here. That will be more consistent and decrease rewriting at the AST level. In particular, this should be `SqlStdOperatorTable .IS_NOT_DISTINCT_FROM`. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java ## @@ -158,7 +157,8 @@ public void onMatch(RelOptRuleCall call) { + " recognized: " + aggCall); } projExprs.add(rexBuilder.makeCall(SqlStdOperatorTable.CASE, - ImmutableList.of(caseFilterCond, rightRef, elseReturn))); + ImmutableList.of(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, + ImmutableList.of(leftRef)), rightRef, elseReturn))); } RexNode joinCond = RexUtil.composeConjunction(rexBuilder, joinConjs); RexNode filterCond = RexUtil.composeConjunction(rexBuilder, filterConjs); Review comment: The .OR condition below would change too. Each branch of the OR is supposed to filter either insert/update operation. Can we use `NOT` on top of the join condition to create the second disjunct for the OR? ``` // (mv2.deptno <=> source.deptno AND mv2.deptname <=> source.deptname) // OR NOT(mv2.deptno <=> source.deptno AND mv2.deptname <=> source.deptname) ``` ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java ## @@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) { projExprs.add(rightRef); joinConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, ImmutableList.of(leftRef, rightRef))); - filterConjs.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, Review comment: In this case, this filter condition should be the same one that it is introduced for the join operator (with `SqlStdOperatorTable .IS_NOT_DISTINCT_FROM`) ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -1118,6 +1118,18 @@ Table materializeCTE(String cteName, CTEClause cte) throws HiveException { } private void fixUpASTAggregateIncrementalRebuild(ASTNode newAST) throws SemanticException { +// Replace equality operators with null safe equality operators in join condition Review comment: This will probably change if the handling is done above. I am hoping the method can stay almost as it was, except for the condition in old L1255 to infer the insert vs update branch, which could probably be done based on `NOT`? ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveAggregateIncrementalRewritingRule.java ## @@ -117,17 +116,17 @@ public void onMatch(RelOptRuleCall call) {
[jira] [Work logged] (HIVE-24733) Handle replication when db location and managed location is set to custom location on source
[ https://issues.apache.org/jira/browse/HIVE-24733?focusedWorklogId=553721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553721 ] ASF GitHub Bot logged work on HIVE-24733: - Author: ASF GitHub Bot Created on: 17/Feb/21 16:38 Start Date: 17/Feb/21 16:38 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1942: URL: https://github.com/apache/hive/pull/1942#discussion_r577765531 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java ## @@ -485,6 +485,9 @@ private Task getReplLoadRootTask(String sourceDb, String replicadb, boolean isIn metricCollector = new BootstrapLoadMetricCollector(replicadb, tuple.dumpLocation, 0, confTemp); } +/* When 'hive.repl.retain.custom.db.locations.on.target' is enabled, the first iteration of repl load would + run only database creation task, and only in next iteration of Repl Load Task execution, remaining tasks will be + executed. Hence disabling this to perform the test on task optimization. */ Review comment: Why is this set to false in BaseReplicationScenarios This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553721) Time Spent: 1h 20m (was: 1h 10m) > Handle replication when db location and managed location is set to custom > location on source > > > Key: HIVE-24733 > URL: https://issues.apache.org/jira/browse/HIVE-24733 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > {color:#172b4d} {color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24728) Low level reader for llap cache hydration
[ https://issues.apache.org/jira/browse/HIVE-24728?focusedWorklogId=553712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553712 ] ASF GitHub Bot logged work on HIVE-24728: - Author: ASF GitHub Bot Created on: 17/Feb/21 16:30 Start Date: 17/Feb/21 16:30 Worklog Time Spent: 10m Work Description: asinkovits opened a new pull request #1990: URL: https://github.com/apache/hive/pull/1990 ### What changes were proposed in this pull request? This is a subtask for the cache hydration feature, it provides a way to read ORC files based on already calculated positions. ### Why are the changes needed? LLAP cache hydration will enable save/load the cache contents. The buffer positions were already calculated, so we need a way to read and load them into the cache. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual tests were conducted. q tests will be added when the feature is ready. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553712) Remaining Estimate: 0h Time Spent: 10m > Low level reader for llap cache hydration > - > > Key: HIVE-24728 > URL: https://issues.apache.org/jira/browse/HIVE-24728 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24728) Low level reader for llap cache hydration
[ https://issues.apache.org/jira/browse/HIVE-24728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24728: -- Labels: pull-request-available (was: ) > Low level reader for llap cache hydration > - > > Key: HIVE-24728 > URL: https://issues.apache.org/jira/browse/HIVE-24728 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24725) Collect top priority items from llap cache policy
[ https://issues.apache.org/jira/browse/HIVE-24725?focusedWorklogId=553704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553704 ] ASF GitHub Bot logged work on HIVE-24725: - Author: ASF GitHub Bot Created on: 17/Feb/21 16:10 Start Date: 17/Feb/21 16:10 Worklog Time Spent: 10m Work Description: asinkovits commented on a change in pull request #1947: URL: https://github.com/apache/hive/pull/1947#discussion_r577741830 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -4564,6 +4564,9 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "The meaning of this parameter is the inverse of the number of time ticks (cache\n" + " operations, currently) that cause the combined recency-frequency of a block in cache\n" + " to be halved."), +LLAP_LRFU_CUTOFF_PERCENTAGE("hive.llap.io.lrfu.cutoff.percentage", 0.10f, Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553704) Time Spent: 40m (was: 0.5h) > Collect top priority items from llap cache policy > - > > Key: HIVE-24725 > URL: https://issues.apache.org/jira/browse/HIVE-24725 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24726) Track required data for cache hydration
[ https://issues.apache.org/jira/browse/HIVE-24726?focusedWorklogId=553700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553700 ] ASF GitHub Bot logged work on HIVE-24726: - Author: ASF GitHub Bot Created on: 17/Feb/21 16:07 Start Date: 17/Feb/21 16:07 Worklog Time Spent: 10m Work Description: asinkovits commented on a change in pull request #1961: URL: https://github.com/apache/hive/pull/1961#discussion_r577739515 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/cache/MemoryLimitedPathCache.java ## @@ -0,0 +1,60 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.llap.cache; + +import com.google.common.cache.Cache; +import com.google.common.cache.CacheBuilder; +import com.google.common.cache.Weigher; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.conf.HiveConf; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Memory limited version of the path cache. + */ +public class MemoryLimitedPathCache implements PathCache { + + private static final Logger LOG = LoggerFactory.getLogger(MemoryLimitedPathCache.class); + private Cache internalCache; + + public MemoryLimitedPathCache(Configuration conf) { +internalCache = CacheBuilder.newBuilder() +.maximumWeight(HiveConf.getSizeVar(conf, HiveConf.ConfVars.LLAP_IO_PATH_CACHE_SIZE)) +.weigher(new PathWeigher()) +.build(); + } + + @Override + public void touch(Object key, String val) { +if (key != null) { + internalCache.put(key, val); +} + } + + @Override + public String resolve(Object key) { +return key != null ? internalCache.getIfPresent(key) : null; + } + + private static class PathWeigher implements Weigher { + +@Override +public int weigh(Object key, String value) { + // String memory footprint Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553700) Time Spent: 40m (was: 0.5h) > Track required data for cache hydration > --- > > Key: HIVE-24726 > URL: https://issues.apache.org/jira/browse/HIVE-24726 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24726) Track required data for cache hydration
[ https://issues.apache.org/jira/browse/HIVE-24726?focusedWorklogId=553699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553699 ] ASF GitHub Bot logged work on HIVE-24726: - Author: ASF GitHub Bot Created on: 17/Feb/21 16:07 Start Date: 17/Feb/21 16:07 Worklog Time Spent: 10m Work Description: asinkovits commented on a change in pull request #1961: URL: https://github.com/apache/hive/pull/1961#discussion_r577739161 ## File path: llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java ## @@ -112,6 +115,8 @@ private final BufferUsageManager bufferManager; private final Configuration daemonConf; private final LowLevelCacheMemoryManager memoryManager; + private LowLevelCachePolicy realCachePolicy; Review comment: Both will be required in HIVE-24727, I've reverted them for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553699) Time Spent: 0.5h (was: 20m) > Track required data for cache hydration > --- > > Key: HIVE-24726 > URL: https://issues.apache.org/jira/browse/HIVE-24726 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed
[ https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=553691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553691 ] ASF GitHub Bot logged work on HIVE-24739: - Author: ASF GitHub Bot Created on: 17/Feb/21 15:54 Start Date: 17/Feb/21 15:54 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1946: URL: https://github.com/apache/hive/pull/1946 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553691) Time Spent: 5h 10m (was: 5h) > Clarify Usage of Thrift TServerEventHandler and Count Number of Messages > Processed > -- > > Key: HIVE-24739 > URL: https://issues.apache.org/jira/browse/HIVE-24739 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > Make the messages emitted from {{TServerEventHandler}} more meaningful. > Also, track the number of messages that each client sends to aid in > troubleshooting. > I run into this issue all the time with and this would greatly help clarify > the logging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed
[ https://issues.apache.org/jira/browse/HIVE-24739?focusedWorklogId=553688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553688 ] ASF GitHub Bot logged work on HIVE-24739: - Author: ASF GitHub Bot Created on: 17/Feb/21 15:44 Start Date: 17/Feb/21 15:44 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1946: URL: https://github.com/apache/hive/pull/1946 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553688) Time Spent: 5h (was: 4h 50m) > Clarify Usage of Thrift TServerEventHandler and Count Number of Messages > Processed > -- > > Key: HIVE-24739 > URL: https://issues.apache.org/jira/browse/HIVE-24739 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Make the messages emitted from {{TServerEventHandler}} more meaningful. > Also, track the number of messages that each client sends to aid in > troubleshooting. > I run into this issue all the time with and this would greatly help clarify > the logging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24778) Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety properties
[ https://issues.apache.org/jira/browse/HIVE-24778?focusedWorklogId=553677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553677 ] ASF GitHub Bot logged work on HIVE-24778: - Author: ASF GitHub Bot Created on: 17/Feb/21 15:27 Start Date: 17/Feb/21 15:27 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1982: URL: https://github.com/apache/hive/pull/1982#discussion_r577705382 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java ## @@ -45,7 +45,7 @@ public TimestampCastRestrictorResolver(UDFMethodResolver parentResolver) { this.parentResolver = parentResolver; SessionState ss = SessionState.get(); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { +if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) { Review comment: Hey @zabetak -- had a chat with @kgyrtkirk regarding this. Seems like udfs do not communicate their accepted format/s as they may introduce converters/Bridges/etc during initialization. This is why HIVE-24157 introduced this restriction as part of the in the resolver This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553677) Time Spent: 0.5h (was: 20m) > Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety > properties > > > Key: HIVE-24778 > URL: https://issues.apache.org/jira/browse/HIVE-24778 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The majority of strict type checks can be controlled by > {{hive.strict.checks.type.safety}} property. HIVE-24157 introduced another > property, namely {{hive.strict.timestamp.conversion}}, to control the > implicit comparisons between numerics and timestamps. > The name and description of {{hive.strict.checks.type.safety}} imply that the > property covers all strict checks so having others for specific cases appears > confusing and can easily lead to unexpected behavior. > The goal of this issue is to unify those properties to facilitate > configuration and improve code reuse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
[ https://issues.apache.org/jira/browse/HIVE-24782?focusedWorklogId=553676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553676 ] ASF GitHub Bot logged work on HIVE-24782: - Author: ASF GitHub Bot Created on: 17/Feb/21 15:26 Start Date: 17/Feb/21 15:26 Worklog Time Spent: 10m Work Description: lcspinter commented on pull request #1989: URL: https://github.com/apache/hive/pull/1989#issuecomment-780632859 Thanks for the patch @jayp12323. Submitted to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553676) Time Spent: 0.5h (was: 20m) > Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components > --- > > Key: HIVE-24782 > URL: https://issues.apache.org/jira/browse/HIVE-24782 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Jason Phelps >Assignee: Jason Phelps >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24782-001.patch, HIVE-24782-002.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > In HIVE-22889, it introduced the following lines: > {code:java} > // remove the leading and trailing quotes. hcatalog can miss on some > cases. > if (execString.length() > 1 && execString.startsWith("\"") && > execString.endsWith("\"")) { > execString = execString.substring(1, execString.length() - 1); > } > {code} > When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because > execString is null but not wrapped in the handy Null check that is nearby > {code:java} > if (execString != null) { > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
[ https://issues.apache.org/jira/browse/HIVE-24782?focusedWorklogId=553675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553675 ] ASF GitHub Bot logged work on HIVE-24782: - Author: ASF GitHub Bot Created on: 17/Feb/21 15:26 Start Date: 17/Feb/21 15:26 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #1989: URL: https://github.com/apache/hive/pull/1989 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553675) Time Spent: 20m (was: 10m) > Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components > --- > > Key: HIVE-24782 > URL: https://issues.apache.org/jira/browse/HIVE-24782 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Jason Phelps >Assignee: Jason Phelps >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24782-001.patch, HIVE-24782-002.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In HIVE-22889, it introduced the following lines: > {code:java} > // remove the leading and trailing quotes. hcatalog can miss on some > cases. > if (execString.length() > 1 && execString.startsWith("\"") && > execString.endsWith("\"")) { > execString = execString.substring(1, execString.length() - 1); > } > {code} > When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because > execString is null but not wrapped in the handy Null check that is nearby > {code:java} > if (execString != null) { > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
[ https://issues.apache.org/jira/browse/HIVE-24782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285843#comment-17285843 ] Jason Phelps commented on HIVE-24782: - Forgot to bring the comment along. Uploaded another patch > Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components > --- > > Key: HIVE-24782 > URL: https://issues.apache.org/jira/browse/HIVE-24782 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Jason Phelps >Assignee: Jason Phelps >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24782-001.patch, HIVE-24782-002.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In HIVE-22889, it introduced the following lines: > {code:java} > // remove the leading and trailing quotes. hcatalog can miss on some > cases. > if (execString.length() > 1 && execString.startsWith("\"") && > execString.endsWith("\"")) { > execString = execString.substring(1, execString.length() - 1); > } > {code} > When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because > execString is null but not wrapped in the handy Null check that is nearby > {code:java} > if (execString != null) { > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
[ https://issues.apache.org/jira/browse/HIVE-24782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Phelps updated HIVE-24782: Attachment: HIVE-24782-002.patch > Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components > --- > > Key: HIVE-24782 > URL: https://issues.apache.org/jira/browse/HIVE-24782 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Jason Phelps >Assignee: Jason Phelps >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24782-001.patch, HIVE-24782-002.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In HIVE-22889, it introduced the following lines: > {code:java} > // remove the leading and trailing quotes. hcatalog can miss on some > cases. > if (execString.length() > 1 && execString.startsWith("\"") && > execString.endsWith("\"")) { > execString = execString.substring(1, execString.length() - 1); > } > {code} > When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because > execString is null but not wrapped in the handy Null check that is nearby > {code:java} > if (execString != null) { > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
[ https://issues.apache.org/jira/browse/HIVE-24782?focusedWorklogId=553622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553622 ] ASF GitHub Bot logged work on HIVE-24782: - Author: ASF GitHub Bot Created on: 17/Feb/21 14:21 Start Date: 17/Feb/21 14:21 Worklog Time Spent: 10m Work Description: jayp12323 opened a new pull request #1989: URL: https://github.com/apache/hive/pull/1989 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553622) Remaining Estimate: 0h Time Spent: 10m > Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components > --- > > Key: HIVE-24782 > URL: https://issues.apache.org/jira/browse/HIVE-24782 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Jason Phelps >Assignee: Jason Phelps >Priority: Major > Attachments: HIVE-24782-001.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In HIVE-22889, it introduced the following lines: > {code:java} > // remove the leading and trailing quotes. hcatalog can miss on some > cases. > if (execString.length() > 1 && execString.startsWith("\"") && > execString.endsWith("\"")) { > execString = execString.substring(1, execString.length() - 1); > } > {code} > When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because > execString is null but not wrapped in the handy Null check that is nearby > {code:java} > if (execString != null) { > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24782) Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components
[ https://issues.apache.org/jira/browse/HIVE-24782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24782: -- Labels: pull-request-available (was: ) > Fix in HIVE-22889 introduced NPE when using non-WebHCat HCat components > --- > > Key: HIVE-24782 > URL: https://issues.apache.org/jira/browse/HIVE-24782 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Jason Phelps >Assignee: Jason Phelps >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24782-001.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In HIVE-22889, it introduced the following lines: > {code:java} > // remove the leading and trailing quotes. hcatalog can miss on some > cases. > if (execString.length() > 1 && execString.startsWith("\"") && > execString.endsWith("\"")) { > execString = execString.substring(1, execString.length() - 1); > } > {code} > When calling Sqoop HCat jobs, or the HCat CLI, it will throw an NPE because > execString is null but not wrapped in the handy Null check that is nearby > {code:java} > if (execString != null) { > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24786) JDBC HttpClient should retry for idempotent and unsent http methods
[ https://issues.apache.org/jira/browse/HIVE-24786?focusedWorklogId=553592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553592 ] ASF GitHub Bot logged work on HIVE-24786: - Author: ASF GitHub Bot Created on: 17/Feb/21 12:58 Start Date: 17/Feb/21 12:58 Worklog Time Spent: 10m Work Description: prasanthj commented on pull request #1983: URL: https://github.com/apache/hive/pull/1983#issuecomment-780539112 @szlta Thanks for the review! I tested the PR on different environment and ran into different set of issues (likely because of OS difference and JDK difference). Encountered a hang issue under some scenarios (socket read timeout). The hang is because there was no timeout defined on the socket created by httpclient and hence it will infinitely wait until server can write something to it (which will never happen because LB disconnected the connection to server). HIVE-12371 provided a way to specify socketTimeout via JDBC param but that was applied only for binary transport and not for http. I made couple of changes to the PR and tested it to make sure the hang never happens 1) For httpmode use the socketTimeout specified by the user via jdbc param 2) The default retry handler does not retry InterruptedIOException. Since SocketTimeoutException (read timeout) is subclass of InterruptedIOException it wasn't retried. So added a custom retry handler with list of classes that will be retried in addition to the default retry (idempotent and unsent requests). Could you please take another look at the PR as it has changed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553592) Time Spent: 20m (was: 10m) > JDBC HttpClient should retry for idempotent and unsent http methods > --- > > Key: HIVE-24786 > URL: https://issues.apache.org/jira/browse/HIVE-24786 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When hiveserver2 is behind multiple proxies there is possibility of "broken > pipe", "connect timeout" and "read timeout" exceptions if one of the > intermediate proxies or load balancers decided to reset the underlying tcp > socket after idle timeout. When the connection is broken and when the query > is submitted after idle timeout from beeline (or client) perspective the > connection is open but http methods (POST/GET) fails with socket related > exceptions. Since these methods are not sent to the server these are safe for > client side retries. > > Also HIVE-12371 seems to apply the socket timeout only to binary transport. > Same can be passed on to http client as well to avoid retry hang issues with > infinite timeouts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24751) Workload Manager sees `No privilege` exception even when authorization is not enabled
[ https://issues.apache.org/jira/browse/HIVE-24751?focusedWorklogId=553589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553589 ] ASF GitHub Bot logged work on HIVE-24751: - Author: ASF GitHub Bot Created on: 17/Feb/21 12:52 Start Date: 17/Feb/21 12:52 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1964: URL: https://github.com/apache/hive/pull/1964#discussion_r577585597 ## File path: service/src/java/org/apache/hive/service/server/KillQueryImpl.java ## @@ -116,9 +117,21 @@ public static void killChildYarnJobs(Configuration conf, String tag, String doAs private static boolean isAdmin() { boolean isAdmin = false; -if (SessionState.get().getAuthorizerV2() != null) { +SessionState ss = SessionState.get(); +if(!HiveConf.getBoolVar(ss.getConf(), HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED)) { Review comment: space between if and ( ## File path: service/src/java/org/apache/hive/service/server/KillQueryImpl.java ## @@ -116,9 +117,21 @@ public static void killChildYarnJobs(Configuration conf, String tag, String doAs private static boolean isAdmin() { boolean isAdmin = false; -if (SessionState.get().getAuthorizerV2() != null) { +SessionState ss = SessionState.get(); +if(!HiveConf.getBoolVar(ss.getConf(), HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED)) { + // If authorization is disabled, hs2 process owner should have kill privileges try { -SessionState.get().getAuthorizerV2() +String currentUser = ss.getUserName(); +String loginUser = UserGroupInformation.getCurrentUser().getShortUserName(); +return (currentUser != null) && currentUser.equals(loginUser); Review comment: if loginUser will never be null then use "return StringUtils.equals(currentUser, loginUser)". this will handle null values also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553589) Time Spent: 20m (was: 10m) > Workload Manager sees `No privilege` exception even when authorization is not > enabled > - > > Key: HIVE-24751 > URL: https://issues.apache.org/jira/browse/HIVE-24751 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > At present it is not checked whether authorization is enabled or not for Kill > Query access. > This causes Workload Manager thread to end up with No privilege Exception > when trying to kill a query in an environment where authorization is disabled. > {code:java} > org.apache.hadoop.hive.ql.metadata.HiveException: No privilege > at > org.apache.hive.service.server.KillQueryImpl.killQuery(KillQueryImpl.java:188) > at > org.apache.hadoop.hive.ql.exec.tez.WorkloadManager.lambda$scheduleWork$3(WorkloadManager.java:454) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hive.service.cli.HiveSQLException: No privilege > at > org.apache.hive.service.server.KillQueryImpl.killQuery(KillQueryImpl.java:167) > ... 6 more{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24376) SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin mode
[ https://issues.apache.org/jira/browse/HIVE-24376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-24376: Parent: HIVE-24384 Issue Type: Sub-task (was: Improvement) > SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin > mode > -- > > Key: HIVE-24376 > URL: https://issues.apache.org/jira/browse/HIVE-24376 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Priority: Major > > the mode name is also a bit confusing..but here is what happens: > {code} > TS[A1] -> ... > TS[A2] -> JOIN > TS[B] -> JOIN > {code} > we have an SJ edge between TS[B] -> TS[A2] to communicate informations about > the join keys; lets assume the reducation ratio was r. > RemoveSemijoin right now does the following: > * removes the semijoin edge (so TS[A2] will become a full scan) > * merges TS[A1] and TS[A2] > w.r.t to read data from disk: this is great - we accessed A twice; from which > 1 was a full scan - and now we only read it once. > but from row traffic perspective: TS[A2] emits more rows from now on because > we dont have the r ratio semijoin reduction anymore. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24376) SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin mode
[ https://issues.apache.org/jira/browse/HIVE-24376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-24376: --- Assignee: Zoltan Haindrich > SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin > mode > -- > > Key: HIVE-24376 > URL: https://issues.apache.org/jira/browse/HIVE-24376 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > the mode name is also a bit confusing..but here is what happens: > {code} > TS[A1] -> ... > TS[A2] -> JOIN > TS[B] -> JOIN > {code} > we have an SJ edge between TS[B] -> TS[A2] to communicate informations about > the join keys; lets assume the reducation ratio was r. > RemoveSemijoin right now does the following: > * removes the semijoin edge (so TS[A2] will become a full scan) > * merges TS[A1] and TS[A2] > w.r.t to read data from disk: this is great - we accessed A twice; from which > 1 was a full scan - and now we only read it once. > but from row traffic perspective: TS[A2] emits more rows from now on because > we dont have the r ratio semijoin reduction anymore. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24778) Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety properties
[ https://issues.apache.org/jira/browse/HIVE-24778?focusedWorklogId=553520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553520 ] ASF GitHub Bot logged work on HIVE-24778: - Author: ASF GitHub Bot Created on: 17/Feb/21 10:37 Start Date: 17/Feb/21 10:37 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #1982: URL: https://github.com/apache/hive/pull/1982#discussion_r577501389 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java ## @@ -84,12 +84,12 @@ public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumen "The function TIMESTAMP takes only primitive types"); } -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { +if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) { PrimitiveCategory category = argumentOI.getPrimitiveCategory(); PrimitiveGrouping group = PrimitiveObjectInspectorUtils.getPrimitiveGrouping(category); if (group == PrimitiveGrouping.NUMERIC_GROUP) { throw new UDFArgumentException( -"Casting NUMERIC types to TIMESTAMP is prohibited (" + ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION + ")"); +"Casting NUMERIC types to TIMESTAMP is prohibited (" + ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY + ")"); } Review comment: Should this be here or rather in `TypeCheckProcFactory.DefaultExprProcessor#validateUDF`? ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/TimestampCastRestrictorResolver.java ## @@ -45,7 +45,7 @@ public TimestampCastRestrictorResolver(UDFMethodResolver parentResolver) { this.parentResolver = parentResolver; SessionState ss = SessionState.get(); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { +if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) { Review comment: If I understand well this class is used to restrict casting timestamp/date to boolean, double, byte, float, integer, long, short values. I am not sure why we should deal with these checks at this point but I if we really need this then I guess it makes sense to extend it so that we apply the same checks for all types under `hive.strict.checks.type.safety` property. Should we create another JIRA for this? ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseCompare.java ## @@ -166,10 +166,10 @@ protected void checkConversionAllowed(ObjectInspector argOI, ObjectInspector com return; } SessionState ss = SessionState.get(); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { +if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY)) { if (primitiveGroupOf(compareOI) == PrimitiveGrouping.NUMERIC_GROUP) { throw new UDFArgumentException( -"Casting DATE/TIMESTAMP to NUMERIC is prohibited (" + ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION + ")"); +"Casting DATE/TIMESTAMP to NUMERIC is prohibited (" + ConfVars.HIVE_STRICT_CHECKS_TYPE_SAFETY + ")"); Review comment: Why do we need this `checkConversionAllowed` method? If conversion is incompatible/dangerous shouldn't this be caught by `TypeCheckProcFactory.DefaultExprProcessor#validateUDF`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553520) Time Spent: 20m (was: 10m) > Unify hive.strict.timestamp.conversion and hive.strict.checks.type.safety > properties > > > Key: HIVE-24778 > URL: https://issues.apache.org/jira/browse/HIVE-24778 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The majority of strict type checks can be controlled by > {{hive.strict.checks.type.safety}} property. HIVE-24157 introduced another > property, namely {{hive.strict.timestamp.conversion}}, to control the > implicit comparisons between numerics and timestamps. > The name and description of {{hive.strict.checks.type.safety}} imply that the > property covers all strict checks so having others for
[jira] [Work logged] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2
[ https://issues.apache.org/jira/browse/HIVE-24743?focusedWorklogId=553479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553479 ] ASF GitHub Bot logged work on HIVE-24743: - Author: ASF GitHub Bot Created on: 17/Feb/21 08:41 Start Date: 17/Feb/21 08:41 Worklog Time Spent: 10m Work Description: kishendas commented on a change in pull request #1956: URL: https://github.com/apache/hive/pull/1956#discussion_r577420383 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -3195,17 +3196,24 @@ public void alterPartitions(String dbName, String tableName, public List getPartitionsByNames(String dbName, String tableName, List partitionNames) throws HiveException { try { - return getMSC().getPartitionsByNames(dbName, tableName, partitionNames); + GetPartitionsByNamesRequest req = new GetPartitionsByNamesRequest(); + req.setDb_name(dbName); + req.setTbl_name(tableName); + req.setNames(partitionNames); + return getPartitionsByNames(req, null); } catch (Exception e) { LOG.error("Failed getPartitionsByNames", e); throw new HiveException(e); } } -public List getPartitionsByNames(GetPartitionsByNamesRequest req) +public List getPartitionsByNames(GetPartitionsByNamesRequest req, + Table table) throws HiveException { try { - Table table = getTable(req.getDb_name(), req.getTbl_name()); + if( table == null ) { +table = getTable(req.getDb_name(), req.getTbl_name()); Review comment: Table Id is not cached right now in the local cache, so it would be the same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553479) Time Spent: 1h (was: 50m) > [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2 > -- > > Key: HIVE-24743 > URL: https://issues.apache.org/jira/browse/HIVE-24743 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > As part of ( HIVE-23821: Send tableId in request for all the new HMS > get_partition APIs ) we added logic to send tableId in the request for > several get_partition APIs, but looks like it was missed out for > getPartitionsByNames. TableId and validWriteIdList are used to maintain > consistency, when HMS API response is being served from a remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24791) Backward compatibility issue in _dumpmetadata
[ https://issues.apache.org/jira/browse/HIVE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma updated HIVE-24791: --- Attachment: HIVE-24791.01.patch > Backward compatibility issue in _dumpmetadata > - > > Key: HIVE-24791 > URL: https://issues.apache.org/jira/browse/HIVE-24791 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24791.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24791) Backward compatibility issue in _dumpmetadata
[ https://issues.apache.org/jira/browse/HIVE-24791?focusedWorklogId=553471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553471 ] ASF GitHub Bot logged work on HIVE-24791: - Author: ASF GitHub Bot Created on: 17/Feb/21 08:17 Start Date: 17/Feb/21 08:17 Worklog Time Spent: 10m Work Description: ArkoSharma opened a new pull request #1988: URL: https://github.com/apache/hive/pull/1988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 553471) Remaining Estimate: 0h Time Spent: 10m > Backward compatibility issue in _dumpmetadata > - > > Key: HIVE-24791 > URL: https://issues.apache.org/jira/browse/HIVE-24791 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24791) Backward compatibility issue in _dumpmetadata
[ https://issues.apache.org/jira/browse/HIVE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24791: -- Labels: pull-request-available (was: ) > Backward compatibility issue in _dumpmetadata > - > > Key: HIVE-24791 > URL: https://issues.apache.org/jira/browse/HIVE-24791 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24791) Backward compatibility issue in _dumpmetadata
[ https://issues.apache.org/jira/browse/HIVE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma reassigned HIVE-24791: -- > Backward compatibility issue in _dumpmetadata > - > > Key: HIVE-24791 > URL: https://issues.apache.org/jira/browse/HIVE-24791 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)