[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows
[ https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585575 ] ASF GitHub Bot logged work on HIVE-24761: - Author: ASF GitHub Bot Created on: 20/Apr/21 05:58 Start Date: 20/Apr/21 05:58 Worklog Time Spent: 10m Work Description: ramesh0201 commented on a change in pull request #2099: URL: https://github.com/apache/hive/pull/2099#discussion_r616364689 ## File path: ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt ## @@ -34,20 +34,17 @@ public class extends VectorExpression { private static final long serialVersionUID = 1L; - private final int colNum1; private final int colNum2; Review comment: It will be better to have the two input columns together in the same place. But it is a good idea to move this to parent class and share the assigning code for all subclasses. Can we introduce two subclasses VectorUnaryExpression and VectorBinaryExpression( or similar name) for VectorExpression and then extend other classes from either one of these classes. That way we can also avoid passing -1 from classes that do not use the inputcolumn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585575) Time Spent: 50m (was: 40m) > Vectorization: Support PTF - bounded start windows > -- > > Key: HIVE-24761 > URL: https://issues.apache.org/jira/browse/HIVE-24761 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > {code} > notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is > supported > {code} > Currently, bounded windows are not supported in VectorPTFOperator. If we > simply remove the check compile-time: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911 > {code} > if (!windowFrameDef.isStartUnbounded()) { > setOperatorIssue(functionName + " only UNBOUNDED start frame is > supported"); > return false; > } > {code} > We get incorrect results, that's because vectorized codepath completely > ignores boundaries, and simply iterates through all the input batches in > [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]: > {code} > for (VectorPTFEvaluatorBase evaluator : evaluators) { > evaluator.evaluateGroupBatch(batch); > if (isLastGroupBatch) { > evaluator.doLastBatchWork(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows
[ https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585574 ] ASF GitHub Bot logged work on HIVE-24761: - Author: ASF GitHub Bot Created on: 20/Apr/21 05:57 Start Date: 20/Apr/21 05:57 Worklog Time Spent: 10m Work Description: ramesh0201 commented on a change in pull request #2099: URL: https://github.com/apache/hive/pull/2099#discussion_r616365379 ## File path: ql/src/test/results/clientpositive/llap/windowing_udaf.q.out ## @@ -503,7 +503,7 @@ alice brown 25.2587496 alice brown25.5293748 alice brown25.63012987012987 alice brown26.472439024390237 -alice brown27.100638297872322 +alice brown27.27881720430106 Review comment: Can we have a comment in the PR review of why this is changed after this patch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585574) Time Spent: 40m (was: 0.5h) > Vectorization: Support PTF - bounded start windows > -- > > Key: HIVE-24761 > URL: https://issues.apache.org/jira/browse/HIVE-24761 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is > supported > {code} > Currently, bounded windows are not supported in VectorPTFOperator. If we > simply remove the check compile-time: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911 > {code} > if (!windowFrameDef.isStartUnbounded()) { > setOperatorIssue(functionName + " only UNBOUNDED start frame is > supported"); > return false; > } > {code} > We get incorrect results, that's because vectorized codepath completely > ignores boundaries, and simply iterates through all the input batches in > [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]: > {code} > for (VectorPTFEvaluatorBase evaluator : evaluators) { > evaluator.evaluateGroupBatch(batch); > if (isLastGroupBatch) { > evaluator.doLastBatchWork(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows
[ https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585572 ] ASF GitHub Bot logged work on HIVE-24761: - Author: ASF GitHub Bot Created on: 20/Apr/21 05:55 Start Date: 20/Apr/21 05:55 Worklog Time Spent: 10m Work Description: ramesh0201 commented on a change in pull request #2099: URL: https://github.com/apache/hive/pull/2099#discussion_r616364689 ## File path: ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt ## @@ -34,20 +34,17 @@ public class extends VectorExpression { private static final long serialVersionUID = 1L; - private final int colNum1; private final int colNum2; Review comment: It will be better to have the two input columns together in the same place. But it is a good idea to move this to parent class and share the assigning code for all subclasses. Can we introduce two subclasses VectorUnaryExpression and VectorBinaryExpression( or similar name) for VectorExpression and then extend other classes from either one of these classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585572) Time Spent: 0.5h (was: 20m) > Vectorization: Support PTF - bounded start windows > -- > > Key: HIVE-24761 > URL: https://issues.apache.org/jira/browse/HIVE-24761 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is > supported > {code} > Currently, bounded windows are not supported in VectorPTFOperator. If we > simply remove the check compile-time: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911 > {code} > if (!windowFrameDef.isStartUnbounded()) { > setOperatorIssue(functionName + " only UNBOUNDED start frame is > supported"); > return false; > } > {code} > We get incorrect results, that's because vectorized codepath completely > ignores boundaries, and simply iterates through all the input batches in > [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]: > {code} > for (VectorPTFEvaluatorBase evaluator : evaluators) { > evaluator.evaluateGroupBatch(batch); > if (isLastGroupBatch) { > evaluator.doLastBatchWork(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585567 ] ASF GitHub Bot logged work on HIVE-25026: - Author: ASF GitHub Bot Created on: 20/Apr/21 05:43 Start Date: 20/Apr/21 05:43 Worklog Time Spent: 10m Work Description: pvary commented on pull request #2189: URL: https://github.com/apache/hive/pull/2189#issuecomment-822989511 > hi @kgyrtkirk @pvary > Please help check why the split18-postprocess failed @zhangheihei: You can follow the `Details` link next to the failed CI run, and that will send you a page where you can see the test details. On the top right of the page you have a `Tests` link where you can see the test results. In this specific case, you have a failure with the `TestNegativeCliDriver`, likely a flaky test. I would push a minimal commit without a real change, to retrigger the CI again. Thanks, Peter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585567) Time Spent: 40m (was: 0.5h) > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-25026.patch > > Time Spent: 40m > Remaining Estimate: 0h > > This issue is the same with hive-24577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry
[ https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=585563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585563 ] ASF GitHub Bot logged work on HIVE-24705: - Author: ASF GitHub Bot Created on: 20/Apr/21 05:29 Start Date: 20/Apr/21 05:29 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on a change in pull request #1960: URL: https://github.com/apache/hive/pull/1960#discussion_r616354694 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13707,14 +13709,27 @@ ASTNode analyzeCreateTable( /** Adds entities for create table/create view. */ private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type, - boolean isTemporary, Map tblProps) throws SemanticException { + boolean isTemporary, Map tblProps, StorageFormat storageFormat) throws SemanticException { Database database = getDatabase(qualifiedTabName[0]); outputs.add(new WriteEntity(database, WriteEntity.WriteType.DDL_SHARED)); Table t = new Table(qualifiedTabName[0], qualifiedTabName[1]); t.setParameters(tblProps); t.setTableType(type); t.setTemporary(isTemporary); +HiveStorageHandler storageHandler = null; +if(storageFormat.getStorageHandler() != null) { + try { +storageHandler = (HiveStorageHandler) ReflectionUtils.newInstance( +conf.getClassByName(storageFormat.getStorageHandler()), SessionState.get().getConf()); + } catch (ClassNotFoundException ex) { +System.out.println("Class not found. Storage handler will be set to null: " + ex); Review comment: Sorry, I was printing this to STDOUT for debugging during development. I forgot to change this to log before pushing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585563) Time Spent: 1h 20m (was: 1h 10m) > Create/Alter/Drop tables based on storage handlers in HS2 should be > authorized by Ranger/Sentry > --- > > Key: HIVE-24705 > URL: https://issues.apache.org/jira/browse/HIVE-24705 > Project: Hive > Issue Type: Improvement >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > With doAs=false in Hive3.x, whenever a user is trying to create a table based > on storage handlers on external storage for ex: HBase table, the end user we > are seeing is hive so we cannot really enforce the condition in Apache > Ranger/Sentry on the end-user. So, we need to enforce this condition in the > hive in the event of create/alter/drop tables based on storage handlers. > Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler > e.t.c should implement a method getURIForAuthentication() which returns a URI > that is formed from table properties. This URI can be sent for authorization > to Ranger/Sentry. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24761) Vectorization: Support PTF - bounded start windows
[ https://issues.apache.org/jira/browse/HIVE-24761?focusedWorklogId=585550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585550 ] ASF GitHub Bot logged work on HIVE-24761: - Author: ASF GitHub Bot Created on: 20/Apr/21 04:26 Start Date: 20/Apr/21 04:26 Worklog Time Spent: 10m Work Description: ramesh0201 commented on a change in pull request #2099: URL: https://github.com/apache/hive/pull/2099#discussion_r616334377 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java ## @@ -3005,6 +3003,19 @@ private boolean validatePTFOperator(PTFOperator op, VectorizationContext vContex } } } + if (vectorPTFDesc.getOrderExprNodeDescs().length > 1) { +/* + * Currently, we need to rule out here all cases where a range boundary scanner can run, + * basically: 1. bounded start 2. bounded end which is not current row + */ +if (windowFrameDef.getWindowType() == WindowType.RANGE +&& (!windowFrameDef.isStartUnbounded() || !windowFrameDef.getEnd().isCurrentRow())) { Review comment: I am not much aware of the range boundary scanner, but do we have any issue vectorizing UNBOUNDED FOLLOWING? Can you help me understand how does range boundary scanner affects the vectorization -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585550) Time Spent: 20m (was: 10m) > Vectorization: Support PTF - bounded start windows > -- > > Key: HIVE-24761 > URL: https://issues.apache.org/jira/browse/HIVE-24761 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code} > notVectorizedReason: PTF operator: *** only UNBOUNDED start frame is > supported > {code} > Currently, bounded windows are not supported in VectorPTFOperator. If we > simply remove the check compile-time: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2911 > {code} > if (!windowFrameDef.isStartUnbounded()) { > setOperatorIssue(functionName + " only UNBOUNDED start frame is > supported"); > return false; > } > {code} > We get incorrect results, that's because vectorized codepath completely > ignores boundaries, and simply iterates through all the input batches in > [VectorPTFGroupBatches|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFGroupBatches.java#L172]: > {code} > for (VectorPTFEvaluatorBase evaluator : evaluators) { > evaluator.evaluateGroupBatch(batch); > if (isLastGroupBatch) { > evaluator.doLastBatchWork(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585537 ] ASF GitHub Bot logged work on HIVE-25026: - Author: ASF GitHub Bot Created on: 20/Apr/21 03:21 Start Date: 20/Apr/21 03:21 Worklog Time Spent: 10m Work Description: zhangheihei edited a comment on pull request #2189: URL: https://github.com/apache/hive/pull/2189#issuecomment-822936070 hi @kgyrtkirk @pvary Please help check why the split18-postprocess failed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585537) Time Spent: 0.5h (was: 20m) > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-25026.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > This issue is the same with hive-24577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585533 ] ASF GitHub Bot logged work on HIVE-25026: - Author: ASF GitHub Bot Created on: 20/Apr/21 02:58 Start Date: 20/Apr/21 02:58 Worklog Time Spent: 10m Work Description: zhangheihei commented on pull request #2189: URL: https://github.com/apache/hive/pull/2189#issuecomment-822936070 hi @kgyrtkirk. Please help check why the split18-postprocess failed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585533) Time Spent: 20m (was: 10m) > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-25026.patch > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is the same with hive-24577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24909) Skip the repl events from getting logged in notification log
[ https://issues.apache.org/jira/browse/HIVE-24909?focusedWorklogId=585509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585509 ] ASF GitHub Bot logged work on HIVE-24909: - Author: ASF GitHub Bot Created on: 20/Apr/21 01:30 Start Date: 20/Apr/21 01:30 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2101: URL: https://github.com/apache/hive/pull/2101#discussion_r616280662 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java ## @@ -8582,7 +8582,8 @@ public GetOpenTxnsInfoResponse get_open_txns_info() throws TException { public OpenTxnsResponse open_txns(OpenTxnRequest rqst) throws TException { OpenTxnsResponse response = getTxnHandler().openTxns(rqst); List txnIds = response.getTxn_ids(); -if (txnIds != null && listeners != null && !listeners.isEmpty()) { +boolean isHiveReplTxn = rqst.isSetReplPolicy() && rqst.getTxn_type() == TxnType.DEFAULT; Review comment: rqst variable can be of different types (OpenTxnRequest/ CommitTxnRequest/ AbortTxnRequest). So, this statement may not be generalised. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585509) Time Spent: 5h 20m (was: 5h 10m) > Skip the repl events from getting logged in notification log > > > Key: HIVE-24909 > URL: https://issues.apache.org/jira/browse/HIVE-24909 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently REPL dump events are logged and replicated as a part of replication > policy. Whenever one replication cycle completed, we always have one > transaction left open on the target corresponding to repl dump operation. > This will never be caught up without manually dealing with the transaction on > target cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25005) Provide default implementation for HMS APIs
[ https://issues.apache.org/jira/browse/HIVE-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved HIVE-25005. Fix Version/s: 4.0.0 Resolution: Fixed > Provide default implementation for HMS APIs > > > Key: HIVE-25005 > URL: https://issues.apache.org/jira/browse/HIVE-25005 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > If there is a remote cache that implements HMS APIs, it would be useful to > have default implementation for all the APIs, so that any new HMS API will > not break the build for the remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25005) Provide default implementation for HMS APIs
[ https://issues.apache.org/jira/browse/HIVE-25005?focusedWorklogId=585398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585398 ] ASF GitHub Bot logged work on HIVE-25005: - Author: ASF GitHub Bot Created on: 19/Apr/21 22:08 Start Date: 19/Apr/21 22:08 Worklog Time Spent: 10m Work Description: vihangk1 merged pull request #2171: URL: https://github.com/apache/hive/pull/2171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585398) Time Spent: 50m (was: 40m) > Provide default implementation for HMS APIs > > > Key: HIVE-25005 > URL: https://issues.apache.org/jira/browse/HIVE-25005 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > If there is a remote cache that implements HMS APIs, it would be useful to > have default implementation for all the APIs, so that any new HMS API will > not break the build for the remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25005) Provide default implementation for HMS APIs
[ https://issues.apache.org/jira/browse/HIVE-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325344#comment-17325344 ] Vihang Karajgaonkar commented on HIVE-25005: Patch was merged into master. Thanks [~kishendas]! > Provide default implementation for HMS APIs > > > Key: HIVE-25005 > URL: https://issues.apache.org/jira/browse/HIVE-25005 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > If there is a remote cache that implements HMS APIs, it would be useful to > have default implementation for all the APIs, so that any new HMS API will > not break the build for the remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25005) Provide default implementation for HMS APIs
[ https://issues.apache.org/jira/browse/HIVE-25005?focusedWorklogId=585396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585396 ] ASF GitHub Bot logged work on HIVE-25005: - Author: ASF GitHub Bot Created on: 19/Apr/21 22:07 Start Date: 19/Apr/21 22:07 Worklog Time Spent: 10m Work Description: vihangk1 commented on pull request #2171: URL: https://github.com/apache/hive/pull/2171#issuecomment-822817878 The test failures are unrelated since this patch doesn't really make functional changes other than adding a new abstract class. The test worked in the previous iteration of the patch which only had a formatting change. Kishen created https://issues.apache.org/jira/browse/HIVE-25030 and https://issues.apache.org/jira/browse/HIVE-25031 to track the failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585396) Time Spent: 40m (was: 0.5h) > Provide default implementation for HMS APIs > > > Key: HIVE-25005 > URL: https://issues.apache.org/jira/browse/HIVE-25005 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > If there is a remote cache that implements HMS APIs, it would be useful to > have default implementation for all the APIs, so that any new HMS API will > not break the build for the remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate
[ https://issues.apache.org/jira/browse/HIVE-24957?focusedWorklogId=585381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585381 ] ASF GitHub Bot logged work on HIVE-24957: - Author: ASF GitHub Bot Created on: 19/Apr/21 21:39 Start Date: 19/Apr/21 21:39 Worklog Time Spent: 10m Work Description: zabetak closed pull request #2186: URL: https://github.com/apache/hive/pull/2186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585381) Time Spent: 0.5h (was: 20m) > Wrong results when subquery has COALESCE in correlation predicate > - > > Key: HIVE-24957 > URL: https://issues.apache.org/jira/browse/HIVE-24957 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Consider the following example: > {code:sql} > create table author ( > a_authorkey int, > a_name varchar(50)); > create table book ( > b_bookkey int, > b_title varchar(50), > b_authorkey int); > insert into author values (10, 'Victor Hugo'); > insert into author values (20, 'Alexandre Dumas'); > insert into author values (300, 'UNKNOWN'); > insert into book values (1, 'Les Miserables', 10); > insert into book values (2, 'The Count of Monte Cristo', 20); > insert into book values (3, 'Men Without Women', 30); > insert into book values (4, 'Odyssey', null); > select b.b_title > from book b > where exists > (select a_authorkey >from author a >where coalesce(b.b_authorkey, 300) = a.a_authorkey); > {code} > *Expected results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > |Odyssey| > *Actual results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > {{Odyssey}} is missing from the result set and it shouldn't since with the > application of COALESCE operator it should match with the UNKNOWN author. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate
[ https://issues.apache.org/jira/browse/HIVE-24957?focusedWorklogId=585380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585380 ] ASF GitHub Bot logged work on HIVE-24957: - Author: ASF GitHub Bot Created on: 19/Apr/21 21:39 Start Date: 19/Apr/21 21:39 Worklog Time Spent: 10m Work Description: zabetak commented on pull request #2186: URL: https://github.com/apache/hive/pull/2186#issuecomment-822803363 Close and reopen to trigger checks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585380) Time Spent: 20m (was: 10m) > Wrong results when subquery has COALESCE in correlation predicate > - > > Key: HIVE-24957 > URL: https://issues.apache.org/jira/browse/HIVE-24957 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Consider the following example: > {code:sql} > create table author ( > a_authorkey int, > a_name varchar(50)); > create table book ( > b_bookkey int, > b_title varchar(50), > b_authorkey int); > insert into author values (10, 'Victor Hugo'); > insert into author values (20, 'Alexandre Dumas'); > insert into author values (300, 'UNKNOWN'); > insert into book values (1, 'Les Miserables', 10); > insert into book values (2, 'The Count of Monte Cristo', 20); > insert into book values (3, 'Men Without Women', 30); > insert into book values (4, 'Odyssey', null); > select b.b_title > from book b > where exists > (select a_authorkey >from author a >where coalesce(b.b_authorkey, 300) = a.a_authorkey); > {code} > *Expected results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > |Odyssey| > *Actual results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > {{Odyssey}} is missing from the result set and it shouldn't since with the > application of COALESCE operator it should match with the UNKNOWN author. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24957) Wrong results when subquery has COALESCE in correlation predicate
[ https://issues.apache.org/jira/browse/HIVE-24957?focusedWorklogId=585382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585382 ] ASF GitHub Bot logged work on HIVE-24957: - Author: ASF GitHub Bot Created on: 19/Apr/21 21:39 Start Date: 19/Apr/21 21:39 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #2186: URL: https://github.com/apache/hive/pull/2186 ### What changes were proposed in this pull request and why? Check commit messages for HIVE-24999 and HIVE-24957. ### Does this PR introduce _any_ user-facing change? Plan changes when using explain (normally more efficient plans) and correct query results. ### How was this patch tested? Via existing tests: ``` mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex="subquery.*" mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex="masking.*" mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver ``` Via newly added tests: ``` mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile="subquery_complex_correlation_predicates.q" mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile="subquery_in_invalid_intermediate_plan.q" -Dcalcite.debug ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585382) Time Spent: 40m (was: 0.5h) > Wrong results when subquery has COALESCE in correlation predicate > - > > Key: HIVE-24957 > URL: https://issues.apache.org/jira/browse/HIVE-24957 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Consider the following example: > {code:sql} > create table author ( > a_authorkey int, > a_name varchar(50)); > create table book ( > b_bookkey int, > b_title varchar(50), > b_authorkey int); > insert into author values (10, 'Victor Hugo'); > insert into author values (20, 'Alexandre Dumas'); > insert into author values (300, 'UNKNOWN'); > insert into book values (1, 'Les Miserables', 10); > insert into book values (2, 'The Count of Monte Cristo', 20); > insert into book values (3, 'Men Without Women', 30); > insert into book values (4, 'Odyssey', null); > select b.b_title > from book b > where exists > (select a_authorkey >from author a >where coalesce(b.b_authorkey, 300) = a.a_authorkey); > {code} > *Expected results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > |Odyssey| > *Actual results* > ||B_TITLE|| > |Les Miserables| > |The Count of Monte Cristo| > {{Odyssey}} is missing from the result set and it shouldn't since with the > application of COALESCE operator it should match with the UNKNOWN author. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25028) Hive: Select query with IS operator producing unexpected result
[ https://issues.apache.org/jira/browse/HIVE-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-25028: --- Reporter: Manthan B Y (was: Soumyakanti Das) > Hive: Select query with IS operator producing unexpected result > --- > > Key: HIVE-25028 > URL: https://issues.apache.org/jira/browse/HIVE-25028 > Project: Hive > Issue Type: Bug > Components: Parser >Reporter: Manthan B Y >Assignee: Soumyakanti Das >Priority: Major > > Hive: Select query with IS operator is producing unexpected result. > The following was executed on postgres: > {code:java} > sqlancer=# create table if not exists emp(name text, age int); > CREATE TABLE > sqlancer=# insert into emp values ('a', 5), ('b', 15), ('c', 12); > INSERT 0 3 > sqlancer=# select emp.age from emp where emp.age > 10; > age > - > 15 > 12 > (2 rows)sqlancer=# select emp.age > 10 is true from emp; > ?column? > -- > f > t > t > (3 rows){code} > This is happening because IS operator has higher precedence than comparison > operators in Hive. In most other databases, comparison operator has higher > precedence. The grammar needs to be changed to fix the precedence. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry
[ https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=585329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585329 ] ASF GitHub Bot logged work on HIVE-24705: - Author: ASF GitHub Bot Created on: 19/Apr/21 19:09 Start Date: 19/Apr/21 19:09 Worklog Time Spent: 10m Work Description: nrg4878 commented on a change in pull request #1960: URL: https://github.com/apache/hive/pull/1960#discussion_r610056943 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13707,14 +13709,27 @@ ASTNode analyzeCreateTable( /** Adds entities for create table/create view. */ private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type, - boolean isTemporary, Map tblProps) throws SemanticException { + boolean isTemporary, Map tblProps, StorageFormat storageFormat) throws SemanticException { Database database = getDatabase(qualifiedTabName[0]); outputs.add(new WriteEntity(database, WriteEntity.WriteType.DDL_SHARED)); Table t = new Table(qualifiedTabName[0], qualifiedTabName[1]); t.setParameters(tblProps); t.setTableType(type); t.setTemporary(isTemporary); +HiveStorageHandler storageHandler = null; +if(storageFormat.getStorageHandler() != null) { + try { +storageHandler = (HiveStorageHandler) ReflectionUtils.newInstance( +conf.getClassByName(storageFormat.getStorageHandler()), SessionState.get().getConf()); + } catch (ClassNotFoundException ex) { +System.out.println("Class not found. Storage handler will be set to null: " + ex); Review comment: Please remove System.out.println and use a log handler instead. or throw an exception if we should not continue. ## File path: ql/src/java/org/apache/hadoop/hive/ql/security/authorization/command/CommandAuthorizerV2.java ## @@ -185,6 +191,33 @@ private static void addHivePrivObject(Entity privObject, Map tableProperties = new HashMap<>(); + Configuration conf = new Configuration(); + tableProperties.putAll(table.getSd().getSerdeInfo().getParameters()); + tableProperties.putAll(table.getParameters()); + try { +if(table.getStorageHandler() instanceof HiveStorageAuthorizationHandler){ + HiveStorageAuthorizationHandler authorizationHandler = (HiveStorageAuthorizationHandler) ReflectionUtils.newInstance( + conf.getClassByName(table.getStorageHandler().getClass().getName()), SessionState.get().getConf()); + storageuri = authorizationHandler.getURIForAuth(tableProperties).toString(); +}else{ + //Custom storage handler that has not implemented the HiveStorageAuthorizationHandler + storageuri = table.getStorageHandler().getClass().getName()+"://"+ + HiveCustomStorageHandlerUtils.getTablePropsForCustomStorageHandler(tableProperties); +} + }catch(Exception ex){ +ex.printStackTrace(); Review comment: can you either log this exception to the log file if this is concerning to the user or ignore it if it not an issue, with a comment instead of printStackTrace() ? This goes to STDOUT and not to the server log. ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageAuthorizationHandler.java ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.metadata; + +import org.apache.hadoop.hive.common.classification.InterfaceAudience; +import org.apache.hadoop.hive.common.classification.InterfaceStability; + +import java.net.URI; +import java.net.URISyntaxException; +import java.util.Map; + +/** + * HiveStorageAuthorizationHandler defines a pluggable interface for + * authorization of storage based tables in Hive. A Storage authorization + * handler consists of a bundle of the following: + * + * + *getURI + * + * + * Storage authorization handler classes are plugged in using the STORED BY 'classname' + * clause in CREATE TABLE. + */
[jira] [Work logged] (HIVE-25005) Provide default implementation for HMS APIs
[ https://issues.apache.org/jira/browse/HIVE-25005?focusedWorklogId=585327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585327 ] ASF GitHub Bot logged work on HIVE-25005: - Author: ASF GitHub Bot Created on: 19/Apr/21 19:06 Start Date: 19/Apr/21 19:06 Worklog Time Spent: 10m Work Description: kishendas commented on pull request #2171: URL: https://github.com/apache/hive/pull/2171#issuecomment-822710757 Created tickets for the flaky tests : https://issues.apache.org/jira/browse/HIVE-25030 and https://issues.apache.org/jira/browse/HIVE-25031 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585327) Time Spent: 0.5h (was: 20m) > Provide default implementation for HMS APIs > > > Key: HIVE-25005 > URL: https://issues.apache.org/jira/browse/HIVE-25005 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > If there is a remote cache that implements HMS APIs, it would be useful to > have default implementation for all the APIs, so that any new HMS API will > not break the build for the remote cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23756) drop table command fails with MySQLIntegrityConstraintViolationException:
[ https://issues.apache.org/jira/browse/HIVE-23756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325194#comment-17325194 ] Naveen Gangam commented on HIVE-23756: -- +1 on this patch. > drop table command fails with MySQLIntegrityConstraintViolationException: > - > > Key: HIVE-23756 > URL: https://issues.apache.org/jira/browse/HIVE-23756 > Project: Hive > Issue Type: Bug >Reporter: Ganesha Shreedhara >Assignee: Ganesha Shreedhara >Priority: Major > Attachments: HIVE-23756.1.patch > > > Drop table command fails intermittently with the following exception. > {code:java} > Caused by: java.sql.BatchUpdateException: Cannot delete or update a parent > row: a foreign key constraint fails ("metastore"."COLUMNS_V2", CONSTRAINT > "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") REFERENCES "CDS" ("CD_ID")) App > at > com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)at > com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277) > Appat > org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372) > at > org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628) > at > org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:207) > at > org.datanucleus.store.rdbms.SQLController.getStatementForUpdate(SQLController.java:179) > at > org.datanucleus.store.rdbms.scostore.JoinMapStore.clearInternal(JoinMapStore.java:901) > ... 36 more > Caused by: > com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: > Cannot delete or update a parent row: a foreign key constraint fails > ("metastore"."COLUMNS_V2", CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY ("CD_ID") > REFERENCES "CDS" ("CD_ID")) > at sun.reflect.GeneratedConstructorAccessor121.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:377) > at com.mysql.jdbc.Util.getInstance(Util.java:360) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823){code} > Although HIVE-19994 resolves this issue, the FK constraint name of COLUMNS_V2 > table specified in package.jdo file is not same as the FK constraint name > used while creating COLUMNS_V2 table ([Ref|#L60]]). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25017) Fix response in GetLatestCommittedCompaction
[ https://issues.apache.org/jira/browse/HIVE-25017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325149#comment-17325149 ] Yu-Wen Lai commented on HIVE-25017: --- [~klcopp] Thank you for reviewing! > Fix response in GetLatestCommittedCompaction > > > Key: HIVE-25017 > URL: https://issues.apache.org/jira/browse/HIVE-25017 > Project: Hive > Issue Type: Bug >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Dbname and Tablename are required for CompactionInfoStruct but the response > of getLatestCommittedCompactionInfo is not setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25010) Create qtest-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25010: -- Labels: pull-request-available (was: ) > Create qtest-iceberg module > --- > > Key: HIVE-25010 > URL: https://issues.apache.org/jira/browse/HIVE-25010 > Project: Hive > Issue Type: Test >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We should create a qtest-iceberg module under itests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25010) Create qtest-iceberg module
[ https://issues.apache.org/jira/browse/HIVE-25010?focusedWorklogId=585187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585187 ] ASF GitHub Bot logged work on HIVE-25010: - Author: ASF GitHub Bot Created on: 19/Apr/21 15:05 Start Date: 19/Apr/21 15:05 Worklog Time Spent: 10m Work Description: lcspinter opened a new pull request #2193: URL: https://github.com/apache/hive/pull/2193 ### What changes were proposed in this pull request? New q test module to run Iceberg specific q tests. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585187) Remaining Estimate: 0h Time Spent: 10m > Create qtest-iceberg module > --- > > Key: HIVE-25010 > URL: https://issues.apache.org/jira/browse/HIVE-25010 > Project: Hive > Issue Type: Test >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We should create a qtest-iceberg module under itests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM
[ https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585161 ] ASF GitHub Bot logged work on HIVE-25006: - Author: ASF GitHub Bot Created on: 19/Apr/21 14:20 Start Date: 19/Apr/21 14:20 Worklog Time Spent: 10m Work Description: marton-bod edited a comment on pull request #2161: URL: https://github.com/apache/hive/pull/2161#issuecomment-822504138 This is roughly what the changes would need to look like once we have the new Tez version released: https://github.com/marton-bod/hive/pull/1 (using the new Tez API instead of the listing) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585161) Time Spent: 0.5h (was: 20m) > Commit Iceberg writes in HiveMetaHook instead of TezAM > -- > > Key: HIVE-25006 > URL: https://issues.apache.org/jira/browse/HIVE-25006 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. > This will enable us to implement insert overwrites for iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25006) Commit Iceberg writes in HiveMetaHook instead of TezAM
[ https://issues.apache.org/jira/browse/HIVE-25006?focusedWorklogId=585160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585160 ] ASF GitHub Bot logged work on HIVE-25006: - Author: ASF GitHub Bot Created on: 19/Apr/21 14:19 Start Date: 19/Apr/21 14:19 Worklog Time Spent: 10m Work Description: marton-bod commented on pull request #2161: URL: https://github.com/apache/hive/pull/2161#issuecomment-822504138 This is roughly what the changes would need to look like once we have the new Tez version released: https://github.com/marton-bod/hive/pull/1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585160) Time Spent: 20m (was: 10m) > Commit Iceberg writes in HiveMetaHook instead of TezAM > -- > > Key: HIVE-25006 > URL: https://issues.apache.org/jira/browse/HIVE-25006 > Project: Hive > Issue Type: Task >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Trigger the write commits in the HiveIcebergStorageHandler#commitInsertTable. > This will enable us to implement insert overwrites for iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25029) Remove travis builds
[ https://issues.apache.org/jira/browse/HIVE-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325039#comment-17325039 ] Zoltan Haindrich commented on HIVE-25029: - if we want to keep something similar - we can access a lot more horsepower thru github actions > Remove travis builds > > > Key: HIVE-25029 > URL: https://issues.apache.org/jira/browse/HIVE-25029 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > travis only compiles the project - we already do much more than that during > precommit testing. > (and it it sometimes delays build because travis cant allocate executors/etc) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25029) Remove travis builds
[ https://issues.apache.org/jira/browse/HIVE-25029?focusedWorklogId=585124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585124 ] ASF GitHub Bot logged work on HIVE-25029: - Author: ASF GitHub Bot Created on: 19/Apr/21 13:33 Start Date: 19/Apr/21 13:33 Worklog Time Spent: 10m Work Description: kgyrtkirk opened a new pull request #2192: URL: https://github.com/apache/hive/pull/2192 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585124) Remaining Estimate: 0h Time Spent: 10m > Remove travis builds > > > Key: HIVE-25029 > URL: https://issues.apache.org/jira/browse/HIVE-25029 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > travis only compiles the project - we already do much more than that during > precommit testing. > (and it it sometimes delays build because travis cant allocate executors/etc) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25029) Remove travis builds
[ https://issues.apache.org/jira/browse/HIVE-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25029: -- Labels: pull-request-available (was: ) > Remove travis builds > > > Key: HIVE-25029 > URL: https://issues.apache.org/jira/browse/HIVE-25029 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > travis only compiles the project - we already do much more than that during > precommit testing. > (and it it sometimes delays build because travis cant allocate executors/etc) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25029) Remove travis builds
[ https://issues.apache.org/jira/browse/HIVE-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25029: --- > Remove travis builds > > > Key: HIVE-25029 > URL: https://issues.apache.org/jira/browse/HIVE-25029 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > travis only compiles the project - we already do much more than that during > precommit testing. > (and it it sometimes delays build because travis cant allocate executors/etc) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17059) Hive Runtime Error while processing row (tag=0)
[ https://issues.apache.org/jira/browse/HIVE-17059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325032#comment-17325032 ] Dhanooj commented on HIVE-17059: Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable at --it seems data mapping issue > Hive Runtime Error while processing row (tag=0) > --- > > Key: HIVE-17059 > URL: https://issues.apache.org/jira/browse/HIVE-17059 > Project: Hive > Issue Type: Bug >Reporter: wenjie.yu >Priority: Major > > I run the sql looks like below in HIVE and got error:Hive Runtime Error > while processing row (tag=0) > *QUERY:* > select > dt as d_date > .. -- group by columns > ,min(epoch_time) as min_epoch_time > ,count(*) as cnt > from > DB.target_Table > where > dt = '20170705' > group by > dt > .. -- group by columns > *ERROR:* > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"_col0":"20170705","_col1":"-2144668477","_col2":"4724a50e-9238-4146-9394-a076acd41836","_col3":"client://plusApp/pdtil_Purchase","_col4":"924","_col5":"61","_col6":"app","_col7":"A","_col8":"comic","_col9":"4724a50e-9238-4146-9394-a076acd41836","_col10":"46f572ce-ed86-4c2a-bf7a-db2171654144"},"value":{"_col0":"46f572ce-ed86-4c2a-bf7a-db2171654144","_col1":"1499186628.124"}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row (tag=0) > {"key":{"_col0":"20170705","_col1":"-2144668477","_col2":"4724a50e-9238-4146-9394-a076acd41836","_col3":"client://plusApp/pdtil_Purchase","_col4":"924","_col5":"61","_col6":"app","_col7":"A","_col8":"comic","_col9":"4724a50e-9238-4146-9394-a076acd41836","_col10":"46f572ce-ed86-4c2a-bf7a-db2171654144"},"value":{"_col0":"46f572ce-ed86-4c2a-bf7a-db2171654144","_col1":"1499186628.124"}} > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253) > ... 7 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:763) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > ... 7 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be > cast to org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:150) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:609) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:848) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:692) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:758) > ... 8 more > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask > {color:#8eb021}1, The table is big one. {color} > 2, Hive Runtime Error while processing row (tag=0) > {"key":{"_col0":"20170705","_col1":"-2144668477","_col2":"4724a50e-9238-4146-9394-a076acd41836","_col3":"client://plusApp/pdtil_Purchase","_col4":"924","_col5":"61","_col6":"app","_col7":"A","_col8":"comic","_col9":"4724a50e-9238-4146-9394-a076acd41836","_col10":"46f572ce-ed86-4c2a-bf7a-db2171654144"},"value":{*"_col0":"46f572ce-ed86-4c2a-bf7a-db2171654144"*,"_col1":"1499186628.124"}} > i think col0 of Value should be a number because it is a count for group by > query. > but the value:46f572ce-ed86-4c2a-bf7a-db2171654144 looks be a value of > column(adid?) > I don't know why. > CDH-5.8.0-1.cdh5.8.0.p0.42 --
[jira] [Updated] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location
[ https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24920: -- Labels: pull-request-available (was: ) > TRANSLATED_TO_EXTERNAL tables may write to the same location > > > Key: HIVE-24920 > URL: https://issues.apache.org/jira/browse/HIVE-24920 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code} > create table t (a integer); > insert into t values(1); > alter table t rename to t2; > create table t (a integer); -- I expected an exception from this command > (location already exists) but because its an external table no exception > insert into t values(2); > select * from t; -- shows 1 and 2 > drop table t2;-- wipes out data location > select * from t; -- empty resultset > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location
[ https://issues.apache.org/jira/browse/HIVE-24920?focusedWorklogId=585115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585115 ] ASF GitHub Bot logged work on HIVE-24920: - Author: ASF GitHub Bot Created on: 19/Apr/21 13:11 Start Date: 19/Apr/21 13:11 Worklog Time Spent: 10m Work Description: kgyrtkirk opened a new pull request #2191: URL: https://github.com/apache/hive/pull/2191 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585115) Remaining Estimate: 0h Time Spent: 10m > TRANSLATED_TO_EXTERNAL tables may write to the same location > > > Key: HIVE-24920 > URL: https://issues.apache.org/jira/browse/HIVE-24920 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > {code} > create table t (a integer); > insert into t values(1); > alter table t rename to t2; > create table t (a integer); -- I expected an exception from this command > (location already exists) but because its an external table no exception > insert into t values(2); > select * from t; -- shows 1 and 2 > drop table t2;-- wipes out data location > select * from t; -- empty resultset > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24577) Task resubmission bug
[ https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hezhang updated HIVE-24577: --- Attachment: HIVE-24577.patch > Task resubmission bug > - > > Key: HIVE-24577 > URL: https://issues.apache.org/jira/browse/HIVE-24577 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 > Environment: hive-2.3.4 >Reporter: guojh >Assignee: hezhang >Priority: Major > Fix For: 2.3.8 > > Attachments: HIVE-24577.patch > > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), tasks submit to yarn with parallel. If the jobs completed > simultaneously, then Their children task may submit more than ones. > In our production cluster, we have a query with the stage dependencies is > below: > {code:java} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 depends on stages: Stage-1, Stage-10, Stage-14 > Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5 > Stage-4 > Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 > Stage-3 > Stage-5 > Stage-6 depends on stages: Stage-5 > Stage-18 is a root stage > Stage-9 depends on stages: Stage-18 > Stage-10 depends on stages: Stage-9 > Stage-19 is a root stage > Stage-13 depends on stages: Stage-19 > Stage-14 depends on stages: Stage-13 > {code} > There is a certain probability that Stage-10 and Stage-14 will complete at > the same time, then their children Stage-2 was submitted twice. As bellow log: > {code:java} > 2021-01-03T13:35:32,079 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 6 out of 6 > 2021-01-03T13:35:32,080 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > 2021-01-03T13:35:32,082 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 7 out of 6 > 2021-01-03T13:35:32,083 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24577) Task resubmission bug
[ https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325025#comment-17325025 ] hezhang commented on HIVE-24577: add patch to hive2. HIVE-25026 update for hive 3. > Task resubmission bug > - > > Key: HIVE-24577 > URL: https://issues.apache.org/jira/browse/HIVE-24577 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 > Environment: hive-2.3.4 >Reporter: guojh >Assignee: hezhang >Priority: Major > Fix For: 2.3.8 > > Attachments: HIVE-24577.patch > > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), tasks submit to yarn with parallel. If the jobs completed > simultaneously, then Their children task may submit more than ones. > In our production cluster, we have a query with the stage dependencies is > below: > {code:java} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 depends on stages: Stage-1, Stage-10, Stage-14 > Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5 > Stage-4 > Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 > Stage-3 > Stage-5 > Stage-6 depends on stages: Stage-5 > Stage-18 is a root stage > Stage-9 depends on stages: Stage-18 > Stage-10 depends on stages: Stage-9 > Stage-19 is a root stage > Stage-13 depends on stages: Stage-19 > Stage-14 depends on stages: Stage-13 > {code} > There is a certain probability that Stage-10 and Stage-14 will complete at > the same time, then their children Stage-2 was submitted twice. As bellow log: > {code:java} > 2021-01-03T13:35:32,079 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 6 out of 6 > 2021-01-03T13:35:32,080 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > 2021-01-03T13:35:32,082 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 7 out of 6 > 2021-01-03T13:35:32,083 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24577) Task resubmission bug
[ https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hezhang updated HIVE-24577: --- Fix Version/s: 2.3.8 > Task resubmission bug > - > > Key: HIVE-24577 > URL: https://issues.apache.org/jira/browse/HIVE-24577 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 > Environment: hive-2.3.4 >Reporter: guojh >Assignee: hezhang >Priority: Major > Fix For: 2.3.8 > > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), tasks submit to yarn with parallel. If the jobs completed > simultaneously, then Their children task may submit more than ones. > In our production cluster, we have a query with the stage dependencies is > below: > {code:java} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 depends on stages: Stage-1, Stage-10, Stage-14 > Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5 > Stage-4 > Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 > Stage-3 > Stage-5 > Stage-6 depends on stages: Stage-5 > Stage-18 is a root stage > Stage-9 depends on stages: Stage-18 > Stage-10 depends on stages: Stage-9 > Stage-19 is a root stage > Stage-13 depends on stages: Stage-19 > Stage-14 depends on stages: Stage-13 > {code} > There is a certain probability that Stage-10 and Stage-14 will complete at > the same time, then their children Stage-2 was submitted twice. As bellow log: > {code:java} > 2021-01-03T13:35:32,079 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 6 out of 6 > 2021-01-03T13:35:32,080 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > 2021-01-03T13:35:32,082 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 7 out of 6 > 2021-01-03T13:35:32,083 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hezhang updated HIVE-25026: --- Attachment: HIVE-25026.patch > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-25026.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This issue is the same with hive-24577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25028) Hive: Select query with IS operator producing unexpected result
[ https://issues.apache.org/jira/browse/HIVE-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Soumyakanti Das reassigned HIVE-25028: -- > Hive: Select query with IS operator producing unexpected result > --- > > Key: HIVE-25028 > URL: https://issues.apache.org/jira/browse/HIVE-25028 > Project: Hive > Issue Type: Bug > Components: Parser >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > > Hive: Select query with IS operator is producing unexpected result. > The following was executed on postgres: > {code:java} > sqlancer=# create table if not exists emp(name text, age int); > CREATE TABLE > sqlancer=# insert into emp values ('a', 5), ('b', 15), ('c', 12); > INSERT 0 3 > sqlancer=# select emp.age from emp where emp.age > 10; > age > - > 15 > 12 > (2 rows)sqlancer=# select emp.age > 10 is true from emp; > ?column? > -- > f > t > t > (3 rows){code} > This is happening because IS operator has higher precedence than comparison > operators in Hive. In most other databases, comparison operator has higher > precedence. The grammar needs to be changed to fix the precedence. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25026: -- Labels: pull-request-available (was: ) > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This issue is the same with hive-24577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?focusedWorklogId=585094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585094 ] ASF GitHub Bot logged work on HIVE-25026: - Author: ASF GitHub Bot Created on: 19/Apr/21 12:10 Start Date: 19/Apr/21 12:10 Worklog Time Spent: 10m Work Description: zhangheihei opened a new pull request #2189: URL: https://github.com/apache/hive/pull/2189 **Hive task job will gen duplicate data cause of same task resubmission** ``` 2021-04-05 06:05:52 CONSOLE# Number of reduce tasks is set to 0 since there's no reduce operator 2021-04-05 06:05:52 CONSOLE# Launching Job 5 out of 4 2021-04-05 06:05:52 CONSOLE# Number of reduce tasks is set to 0 since there's no reduce operator ``` https://user-images.githubusercontent.com/13237066/115213523-2d945800-a134-11eb-94c3-52095c748283.png; width="300" height="300"> For example, hive sql explain 4 task. when hive.exec.parallel=true and task2/task3 is canExecuteInParallel,task4 will execute 2 times; 1. task1 is FINISHED, task2/task3 enter Runnable queue https://user-images.githubusercontent.com/13237066/115233371-65a69580-a14a-11eb-81fb-5a0c3582e3dc.png; width="400" height="150"> 2. task2/task3 is executed in parallel and ends at the same time. Now task2/task3 is FINISHED https://user-images.githubusercontent.com/13237066/115233876-06955080-a14b-11eb-9570-7334eff8dcad.png; width="400" height="150"> 3. task2 removed from running queue, task4 will enter runnable queue 4. 4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585094) Remaining Estimate: 0h Time Spent: 10m > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > This issue is the same with hive-24577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask
[ https://issues.apache.org/jira/browse/HIVE-25002?focusedWorklogId=585067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585067 ] ASF GitHub Bot logged work on HIVE-25002: - Author: ASF GitHub Bot Created on: 19/Apr/21 10:42 Start Date: 19/Apr/21 10:42 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2167: URL: https://github.com/apache/hive/pull/2167#discussion_r615737849 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -211,7 +211,7 @@ private void stopWorkers() { } private List processOneTable(TableName fullTableName) - throws MetaException, NoSuchTxnException, NoSuchObjectException { + throws MetaException, NoSuchTxnException, NoSuchObjectException, TException { Review comment: will check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585067) Time Spent: 1h 20m (was: 1h 10m) > modify condition for target of replication in statsUpdaterThread and > PartitionManagementTask > > > Key: HIVE-25002 > URL: https://issues.apache.org/jira/browse/HIVE-25002 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask
[ https://issues.apache.org/jira/browse/HIVE-25002?focusedWorklogId=585063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585063 ] ASF GitHub Bot logged work on HIVE-25002: - Author: ASF GitHub Bot Created on: 19/Apr/21 10:40 Start Date: 19/Apr/21 10:40 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2167: URL: https://github.com/apache/hive/pull/2167#discussion_r615736521 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java ## @@ -211,7 +211,7 @@ private void stopWorkers() { } private List processOneTable(TableName fullTableName) - throws MetaException, NoSuchTxnException, NoSuchObjectException { + throws MetaException, NoSuchTxnException, NoSuchObjectException, TException { Review comment: is the TException needed?> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585063) Time Spent: 1h 10m (was: 1h) > modify condition for target of replication in statsUpdaterThread and > PartitionManagementTask > > > Key: HIVE-25002 > URL: https://issues.apache.org/jira/browse/HIVE-25002 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25002) modify condition for target of replication in statsUpdaterThread and PartitionManagementTask
[ https://issues.apache.org/jira/browse/HIVE-25002?focusedWorklogId=585061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585061 ] ASF GitHub Bot logged work on HIVE-25002: - Author: ASF GitHub Bot Created on: 19/Apr/21 10:40 Start Date: 19/Apr/21 10:40 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #2167: URL: https://github.com/apache/hive/pull/2167#discussion_r615736279 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java ## @@ -140,7 +140,7 @@ public static final String RANGER_CONFIGURATION_RESOURCE_NAME = "ranger-hive-security.xml"; - public static final String TARGET_OF_REPLICATION = "repl.target.for"; + public static final String TARGET_OF_REPLICATION = ReplConst.TARGET_OF_REPLICATION; Review comment: then use the same constant everywhere -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585061) Time Spent: 1h (was: 50m) > modify condition for target of replication in statsUpdaterThread and > PartitionManagementTask > > > Key: HIVE-25002 > URL: https://issues.apache.org/jira/browse/HIVE-25002 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25027) Hide Iceberg module behind a profile
[ https://issues.apache.org/jira/browse/HIVE-25027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25027: -- Labels: pull-request-available (was: ) > Hide Iceberg module behind a profile > > > Key: HIVE-25027 > URL: https://issues.apache.org/jira/browse/HIVE-25027 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > After creating {{patched-iceberg-core}} and {{patched-iceberg-api}} modules > the maven build works fine, but IntelliJ needs manual classpath setup for the > build in the IntelliJ to succeed. > Most of the community does not use Iceberg and eventually the "patched" > modules will be removed as the Hive-Iceberg integration stabilizes and the > Iceberg project releases the changes we need. In the meantime we just hide > the whole {{Iceberg}} module behind a profile which is only used on the CI > and if the developer specifically sets it. > It could be used like" > {code:java} > mvn clean install -DskipTests -Piceberg{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25027) Hide Iceberg module behind a profile
[ https://issues.apache.org/jira/browse/HIVE-25027?focusedWorklogId=585021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585021 ] ASF GitHub Bot logged work on HIVE-25027: - Author: ASF GitHub Bot Created on: 19/Apr/21 10:14 Start Date: 19/Apr/21 10:14 Worklog Time Spent: 10m Work Description: pvary opened a new pull request #2188: URL: https://github.com/apache/hive/pull/2188 ### What changes were proposed in this pull request? Hide Iceberg module behind a profile ### Why are the changes needed? After creating patched-iceberg-core and patched-iceberg-api modules the maven build works fine, but IntelliJ needs manual classpath setup for the build in the IntelliJ to succeed. Most of the community does not use Iceberg and eventually the "patched" modules will be removed as the Hive-Iceberg integration stabilizes and the Iceberg project releases the changes we need. In the meantime we just hide the whole Iceberg module behind a profile which is only used on the CI and if the developer specifically sets it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rebuilt the project in maven and in IntelliJ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 585021) Remaining Estimate: 0h Time Spent: 10m > Hide Iceberg module behind a profile > > > Key: HIVE-25027 > URL: https://issues.apache.org/jira/browse/HIVE-25027 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > After creating {{patched-iceberg-core}} and {{patched-iceberg-api}} modules > the maven build works fine, but IntelliJ needs manual classpath setup for the > build in the IntelliJ to succeed. > Most of the community does not use Iceberg and eventually the "patched" > modules will be removed as the Hive-Iceberg integration stabilizes and the > Iceberg project releases the changes we need. In the meantime we just hide > the whole {{Iceberg}} module behind a profile which is only used on the CI > and if the developer specifically sets it. > It could be used like" > {code:java} > mvn clean install -DskipTests -Piceberg{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25027) Hide Iceberg module behind a profile
[ https://issues.apache.org/jira/browse/HIVE-25027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-25027: - > Hide Iceberg module behind a profile > > > Key: HIVE-25027 > URL: https://issues.apache.org/jira/browse/HIVE-25027 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > > After creating {{patched-iceberg-core}} and {{patched-iceberg-api}} modules > the maven build works fine, but IntelliJ needs manual classpath setup for the > build in the IntelliJ to succeed. > Most of the community does not use Iceberg and eventually the "patched" > modules will be removed as the Hive-Iceberg integration stabilizes and the > Iceberg project releases the changes we need. In the meantime we just hide > the whole {{Iceberg}} module behind a profile which is only used on the CI > and if the developer specifically sets it. > It could be used like" > {code:java} > mvn clean install -DskipTests -Piceberg{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor
[ https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=584981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584981 ] ASF GitHub Bot logged work on HIVE-24851: - Author: ASF GitHub Bot Created on: 19/Apr/21 09:01 Start Date: 19/Apr/21 09:01 Worklog Time Spent: 10m Work Description: pvary commented on pull request #2129: URL: https://github.com/apache/hive/pull/2129#issuecomment-822301658 Merged. Thanks for the fix and the work done to backport the change @losipiuk! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584981) Time Spent: 8h (was: 7h 50m) > resources leak on exception in AvroGenericRecordReader constructor > -- > > Key: HIVE-24851 > URL: https://issues.apache.org/jira/browse/HIVE-24851 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2, 4.0.0 >Reporter: Lukasz Osipiuk >Assignee: Lukasz Osipiuk >Priority: Major > Labels: pull-request-available > Fix For: 3.1.3, 3.2.0, 4.0.0 > > Time Spent: 8h > Remaining Estimate: 0h > > AvroGenericRecordReader constructor creates an instance of FileReader but > lacks proper exception handling, and reader is not closed on the failure path. > This results in leaking of underlying resources (e.g. S3 connections). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor
[ https://issues.apache.org/jira/browse/HIVE-24851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-24851: -- Fix Version/s: 3.1.3 > resources leak on exception in AvroGenericRecordReader constructor > -- > > Key: HIVE-24851 > URL: https://issues.apache.org/jira/browse/HIVE-24851 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2, 4.0.0 >Reporter: Lukasz Osipiuk >Assignee: Lukasz Osipiuk >Priority: Major > Labels: pull-request-available > Fix For: 3.1.3, 3.2.0, 4.0.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > AvroGenericRecordReader constructor creates an instance of FileReader but > lacks proper exception handling, and reader is not closed on the failure path. > This results in leaking of underlying resources (e.g. S3 connections). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor
[ https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=584976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584976 ] ASF GitHub Bot logged work on HIVE-24851: - Author: ASF GitHub Bot Created on: 19/Apr/21 08:56 Start Date: 19/Apr/21 08:56 Worklog Time Spent: 10m Work Description: pvary merged pull request #2129: URL: https://github.com/apache/hive/pull/2129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584976) Time Spent: 7h 50m (was: 7h 40m) > resources leak on exception in AvroGenericRecordReader constructor > -- > > Key: HIVE-24851 > URL: https://issues.apache.org/jira/browse/HIVE-24851 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2, 4.0.0 >Reporter: Lukasz Osipiuk >Assignee: Lukasz Osipiuk >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0, 4.0.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > AvroGenericRecordReader constructor creates an instance of FileReader but > lacks proper exception handling, and reader is not closed on the failure path. > This results in leaking of underlying resources (e.g. S3 connections). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor
[ https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=584972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584972 ] ASF GitHub Bot logged work on HIVE-24851: - Author: ASF GitHub Bot Created on: 19/Apr/21 08:47 Start Date: 19/Apr/21 08:47 Worklog Time Spent: 10m Work Description: losipiuk commented on pull request #2129: URL: https://github.com/apache/hive/pull/2129#issuecomment-822291962 @pvary mergable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584972) Time Spent: 7h 40m (was: 7.5h) > resources leak on exception in AvroGenericRecordReader constructor > -- > > Key: HIVE-24851 > URL: https://issues.apache.org/jira/browse/HIVE-24851 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2, 4.0.0 >Reporter: Lukasz Osipiuk >Assignee: Lukasz Osipiuk >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0, 4.0.0 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > AvroGenericRecordReader constructor creates an instance of FileReader but > lacks proper exception handling, and reader is not closed on the failure path. > This results in leaking of underlying resources (e.g. S3 connections). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24851) resources leak on exception in AvroGenericRecordReader constructor
[ https://issues.apache.org/jira/browse/HIVE-24851?focusedWorklogId=584970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584970 ] ASF GitHub Bot logged work on HIVE-24851: - Author: ASF GitHub Bot Created on: 19/Apr/21 08:46 Start Date: 19/Apr/21 08:46 Worklog Time Spent: 10m Work Description: losipiuk commented on pull request #2129: URL: https://github.com/apache/hive/pull/2129#issuecomment-822290914 Based on 3 runs it looks like all the test failures are flakes. Same tests (which FWIW seem totally unrelated to the change) fail on on run and pass on the other. * (1) http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2129/1/tests/ * Only preexisting failures * (2) http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2129/2/tests/ * TestSlotZnode.testConcurrencyNoFallback failed (green on (1) and (3)) * TestSlotZnode.testConcurrencyAndFallback failed (green on (1) and (3)) * (3) http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2129/3/tests/ * TestJdbcDriver2.testSelectExecAsync2 failed (was green on (2) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584970) Time Spent: 7.5h (was: 7h 20m) > resources leak on exception in AvroGenericRecordReader constructor > -- > > Key: HIVE-24851 > URL: https://issues.apache.org/jira/browse/HIVE-24851 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2, 4.0.0 >Reporter: Lukasz Osipiuk >Assignee: Lukasz Osipiuk >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0, 4.0.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > AvroGenericRecordReader constructor creates an instance of FileReader but > lacks proper exception handling, and reader is not closed on the failure path. > This results in leaking of underlying resources (e.g. S3 connections). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25019) Rename metrics that have spaces in the name
[ https://issues.apache.org/jira/browse/HIVE-25019?focusedWorklogId=584949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584949 ] ASF GitHub Bot logged work on HIVE-25019: - Author: ASF GitHub Bot Created on: 19/Apr/21 08:14 Start Date: 19/Apr/21 08:14 Worklog Time Spent: 10m Work Description: klcopp merged pull request #2183: URL: https://github.com/apache/hive/pull/2183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584949) Time Spent: 20m (was: 10m) > Rename metrics that have spaces in the name > --- > > Key: HIVE-25019 > URL: https://issues.apache.org/jira/browse/HIVE-25019 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Metrics "num_compactions_ready for cleaning" and "num_compactions_not > initiated" contain spaces. > They should be renamed to "num_compactions_ready_for_cleaning" and > "num_compactions_not_initiated" respectively. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25019) Rename metrics that have spaces in the name
[ https://issues.apache.org/jira/browse/HIVE-25019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage resolved HIVE-25019. -- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master branch. Thanks for your contribution [~asinkovits]! > Rename metrics that have spaces in the name > --- > > Key: HIVE-25019 > URL: https://issues.apache.org/jira/browse/HIVE-25019 > Project: Hive > Issue Type: Sub-task >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Metrics "num_compactions_ready for cleaning" and "num_compactions_not > initiated" contain spaces. > They should be renamed to "num_compactions_ready_for_cleaning" and > "num_compactions_not_initiated" respectively. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25023) Optimize the operation of reading jar stream to avoid stream closed exception
[ https://issues.apache.org/jira/browse/HIVE-25023?focusedWorklogId=584948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584948 ] ASF GitHub Bot logged work on HIVE-25023: - Author: ASF GitHub Bot Created on: 19/Apr/21 08:13 Start Date: 19/Apr/21 08:13 Worklog Time Spent: 10m Work Description: dh20 commented on pull request #2185: URL: https://github.com/apache/hive/pull/2185#issuecomment-822269434 @sunchao hi,Sir, can you review it for me, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584948) Time Spent: 20m (was: 10m) > Optimize the operation of reading jar stream to avoid stream closed exception > - > > Key: HIVE-25023 > URL: https://issues.apache.org/jira/browse/HIVE-25023 > Project: Hive > Issue Type: Improvement > Components: JDBC >Affects Versions: 3.1.2 >Reporter: hao >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Optimize the operation of reading jar stream to avoid stream closed exception -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangHualei resolved HIVE-25025. --- Resolution: Fixed distcp method different from FileUtil.copy, because distcp method ignore parent directory, then may overwrite stats files,cause stats info lost.? modify dst value, add parent directory to dst path, make distcp method run result as same as?FileUtil.copy. > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Labels: patch, pull-request-available > Attachments: HIVE-25025.patch > > Original Estimate: 72h > Time Spent: 10m > Remaining Estimate: 71h 50m > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. > example: > set hive.exec.copyfile.maxsize=0; > set hive.exec.copyfile.maxnumfiles=0; > insert overwrite table abc_new select * from abc; > select count(1) from abc_new ; > select * from abc_new ; > then the count(1) result will be 0, but select * will display real data, > because stats info lost. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangHualei updated HIVE-25025: -- Status: In Progress (was: Patch Available) distcp method different from FileUtil.copy, because distcp method ignore parent directory, then may overwrite stats files,cause stats info lost.? modify dst value, add parent directory to dst path, make distcp method run result as same as?FileUtil.copy. > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Labels: patch, pull-request-available > Attachments: HIVE-25025.patch > > Original Estimate: 72h > Time Spent: 10m > Remaining Estimate: 71h 50m > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. > example: > set hive.exec.copyfile.maxsize=0; > set hive.exec.copyfile.maxnumfiles=0; > insert overwrite table abc_new select * from abc; > select count(1) from abc_new ; > select * from abc_new ; > then the count(1) result will be 0, but select * will display real data, > because stats info lost. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangHualei updated HIVE-25025: -- Attachment: HIVE-25025.patch Labels: patch pull-request-available (was: pull-request-available) Status: Patch Available (was: In Progress) distcp method different from FileUtil.copy, because distcp method ignore parent directory, then may overwrite stats files,cause stats info lost.? modify dst value, add parent directory to dst path, make distcp method run result as same as?FileUtil.copy. > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Labels: pull-request-available, patch > Attachments: HIVE-25025.patch > > Original Estimate: 72h > Time Spent: 10m > Remaining Estimate: 71h 50m > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. > example: > set hive.exec.copyfile.maxsize=0; > set hive.exec.copyfile.maxnumfiles=0; > insert overwrite table abc_new select * from abc; > select count(1) from abc_new ; > select * from abc_new ; > then the count(1) result will be 0, but select * will display real data, > because stats info lost. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hezhang updated HIVE-25026: --- Description: This issue is the same with hive-24577 > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > > This issue is the same with hive-24577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25026) hive sql result is duplicate data cause of same task resubmission
[ https://issues.apache.org/jira/browse/HIVE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hezhang reassigned HIVE-25026: -- > hive sql result is duplicate data cause of same task resubmission > - > > Key: HIVE-25026 > URL: https://issues.apache.org/jira/browse/HIVE-25026 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.1 >Reporter: hezhang >Assignee: hezhang >Priority: Critical > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?focusedWorklogId=584939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584939 ] ASF GitHub Bot logged work on HIVE-25025: - Author: ASF GitHub Bot Created on: 19/Apr/21 07:45 Start Date: 19/Apr/21 07:45 Worklog Time Spent: 10m Work Description: wanghualei opened a new pull request #2187: URL: https://github.com/apache/hive/pull/2187 after set Run as end user instead of Hive user , when execute insert overwrite , In MoveTask ,if source byte > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause tmp stats file lost. example: set hive.exec.copyfile.maxsize=0; set hive.exec.copyfile.maxnumfiles=0; insert overwrite table abc_new select * from abc; select count(1) from abc_new ; select * from abc_new ; then the count(1) result will be 0, but select * will display real data, because stats info lost. fix: distcp method different from FileUtil.copy, because distcp method ignore parent directory, then may overwrite stats files,cause stats info lost.? modify dst value, add parent directory to dst path, make distcp method run result as same as?FileUtil.copy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584939) Remaining Estimate: 71h 50m (was: 72h) Time Spent: 10m > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Original Estimate: 72h > Time Spent: 10m > Remaining Estimate: 71h 50m > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. > example: > set hive.exec.copyfile.maxsize=0; > set hive.exec.copyfile.maxnumfiles=0; > insert overwrite table abc_new select * from abc; > select count(1) from abc_new ; > select * from abc_new ; > then the count(1) result will be 0, but select * will display real data, > because stats info lost. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25025: -- Labels: pull-request-available (was: ) > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Labels: pull-request-available > Original Estimate: 72h > Time Spent: 10m > Remaining Estimate: 71h 50m > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. > example: > set hive.exec.copyfile.maxsize=0; > set hive.exec.copyfile.maxnumfiles=0; > insert overwrite table abc_new select * from abc; > select count(1) from abc_new ; > select * from abc_new ; > then the count(1) result will be 0, but select * will display real data, > because stats info lost. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25025 started by WangHualei. - > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. > example: > set hive.exec.copyfile.maxsize=0; > set hive.exec.copyfile.maxnumfiles=0; > insert overwrite table abc_new select * from abc; > select count(1) from abc_new ; > select * from abc_new ; > then the count(1) result will be 0, but select * will display real data, > because stats info lost. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25017) Fix response in GetLatestCommittedCompaction
[ https://issues.apache.org/jira/browse/HIVE-25017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage resolved HIVE-25017. -- Resolution: Fixed Committed to master branch. Thanks for your contribution [~hsnusonic]! > Fix response in GetLatestCommittedCompaction > > > Key: HIVE-25017 > URL: https://issues.apache.org/jira/browse/HIVE-25017 > Project: Hive > Issue Type: Bug >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Dbname and Tablename are required for CompactionInfoStruct but the response > of getLatestCommittedCompactionInfo is not setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25017) Fix response in GetLatestCommittedCompaction
[ https://issues.apache.org/jira/browse/HIVE-25017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage updated HIVE-25017: - Fix Version/s: 4.0.0 > Fix response in GetLatestCommittedCompaction > > > Key: HIVE-25017 > URL: https://issues.apache.org/jira/browse/HIVE-25017 > Project: Hive > Issue Type: Bug >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Dbname and Tablename are required for CompactionInfoStruct but the response > of getLatestCommittedCompactionInfo is not setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25017) Fix response in GetLatestCommittedCompaction
[ https://issues.apache.org/jira/browse/HIVE-25017?focusedWorklogId=584930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584930 ] ASF GitHub Bot logged work on HIVE-25017: - Author: ASF GitHub Bot Created on: 19/Apr/21 07:26 Start Date: 19/Apr/21 07:26 Worklog Time Spent: 10m Work Description: klcopp merged pull request #2181: URL: https://github.com/apache/hive/pull/2181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584930) Time Spent: 20m (was: 10m) > Fix response in GetLatestCommittedCompaction > > > Key: HIVE-25017 > URL: https://issues.apache.org/jira/browse/HIVE-25017 > Project: Hive > Issue Type: Bug >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Dbname and Tablename are required for CompactionInfoStruct but the response > of getLatestCommittedCompactionInfo is not setting them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangHualei reassigned HIVE-25025: - > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive > Environment: example: > set hive.exec.copyfile.maxsize=0; > set hive.exec.copyfile.maxnumfiles=0; > insert overwrite table abd_new select * from abc; > select count(*) from abd_new ; > select * from abd_new ; > then the count(*) result will be 0, but select * will display real data, > because stats info lost. >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangHualei updated HIVE-25025: -- Description: after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when execute insert overwrite , In MoveTask ,if source byte > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause tmp stats file lost. example: set hive.exec.copyfile.maxsize=0; set hive.exec.copyfile.maxnumfiles=0; insert overwrite table abc_new select * from abc; select count(1) from abc_new ; select * from abc_new ; then the count(1) result will be 0, but select * will display real data, because stats info lost. was:after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when execute insert overwrite , In MoveTask ,if source byte > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause tmp stats file lost. > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. > example: > set hive.exec.copyfile.maxsize=0; > set hive.exec.copyfile.maxnumfiles=0; > insert overwrite table abc_new select * from abc; > select count(1) from abc_new ; > select * from abc_new ; > then the count(1) result will be 0, but select * will display real data, > because stats info lost. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25025) Distcp In MoveTask may cause stats info lost
[ https://issues.apache.org/jira/browse/HIVE-25025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WangHualei updated HIVE-25025: -- Environment: (was: example: set hive.exec.copyfile.maxsize=0; set hive.exec.copyfile.maxnumfiles=0; insert overwrite table abd_new select * from abc; select count(*) from abd_new ; select * from abd_new ; then the count(*) result will be 0, but select * will display real data, because stats info lost.) > Distcp In MoveTask may cause stats info lost > > > Key: HIVE-25025 > URL: https://issues.apache.org/jira/browse/HIVE-25025 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: WangHualei >Assignee: WangHualei >Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > after set _Run_ _as_ _end_ _user_ _instead_ _of_ _Hive_ _user_ , when > execute insert overwrite , In MoveTask ,if source byte > > HIVE_EXEC_COPYFILE_MAXSIZE and source file count> > HIVE_EXEC_COPYFILE_MAXNUMFILES , HIve will use distcp method, it may cause > tmp stats file lost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25009) Compaction worker and initiator version check can cause NPE if the COMPACTION_QUEUE is empty
[ https://issues.apache.org/jira/browse/HIVE-25009?focusedWorklogId=584925=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584925 ] ASF GitHub Bot logged work on HIVE-25009: - Author: ASF GitHub Bot Created on: 19/Apr/21 07:04 Start Date: 19/Apr/21 07:04 Worklog Time Spent: 10m Work Description: deniskuzZ merged pull request #2175: URL: https://github.com/apache/hive/pull/2175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584925) Time Spent: 40m (was: 0.5h) > Compaction worker and initiator version check can cause NPE if the > COMPACTION_QUEUE is empty > > > Key: HIVE-25009 > URL: https://issues.apache.org/jira/browse/HIVE-25009 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24577) Task resubmission bug
[ https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hezhang reassigned HIVE-24577: -- Assignee: hezhang (was: guojh) > Task resubmission bug > - > > Key: HIVE-24577 > URL: https://issues.apache.org/jira/browse/HIVE-24577 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 > Environment: hive-2.3.4 >Reporter: guojh >Assignee: hezhang >Priority: Major > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), tasks submit to yarn with parallel. If the jobs completed > simultaneously, then Their children task may submit more than ones. > In our production cluster, we have a query with the stage dependencies is > below: > {code:java} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 depends on stages: Stage-1, Stage-10, Stage-14 > Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5 > Stage-4 > Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 > Stage-3 > Stage-5 > Stage-6 depends on stages: Stage-5 > Stage-18 is a root stage > Stage-9 depends on stages: Stage-18 > Stage-10 depends on stages: Stage-9 > Stage-19 is a root stage > Stage-13 depends on stages: Stage-19 > Stage-14 depends on stages: Stage-13 > {code} > There is a certain probability that Stage-10 and Stage-14 will complete at > the same time, then their children Stage-2 was submitted twice. As bellow log: > {code:java} > 2021-01-03T13:35:32,079 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 6 out of 6 > 2021-01-03T13:35:32,080 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > 2021-01-03T13:35:32,082 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 7 out of 6 > 2021-01-03T13:35:32,083 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25016) Error while running repl dump with with db regex
[ https://issues.apache.org/jira/browse/HIVE-25016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi resolved HIVE-25016. Resolution: Fixed > Error while running repl dump with with db regex > > > Key: HIVE-25016 > URL: https://issues.apache.org/jira/browse/HIVE-25016 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Doing incremental dump with create-database event with dbRegex `*` gives the > following exception : > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: Execution Error, return code -101 from > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.parse.EximUtil.updateIfCustomDbLocations(EximUtil.java:388) > at > org.apache.hadoop.hive.ql.parse.EximUtil.createDbExportDump(EximUtil.java:357) > at > org.apache.hadoop.hive.ql.parse.repl.dump.events.CreateDatabaseHandler.handle(CreateDatabaseHandler.java:42) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpEvent(ReplDumpTask.java:827) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:632) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:209) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25016) Error while running repl dump with with db regex
[ https://issues.apache.org/jira/browse/HIVE-25016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324733#comment-17324733 ] Aasha Medhi commented on HIVE-25016: +1 Committed to master. Thank you for the patch [~^sharma] > Error while running repl dump with with db regex > > > Key: HIVE-25016 > URL: https://issues.apache.org/jira/browse/HIVE-25016 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Doing incremental dump with create-database event with dbRegex `*` gives the > following exception : > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: Execution Error, return code -101 from > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask. > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.parse.EximUtil.updateIfCustomDbLocations(EximUtil.java:388) > at > org.apache.hadoop.hive.ql.parse.EximUtil.createDbExportDump(EximUtil.java:357) > at > org.apache.hadoop.hive.ql.parse.repl.dump.events.CreateDatabaseHandler.handle(CreateDatabaseHandler.java:42) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.dumpEvent(ReplDumpTask.java:827) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.incrementalDump(ReplDumpTask.java:632) > at > org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask.execute(ReplDumpTask.java:209) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324725#comment-17324725 ] László Pintér commented on HIVE-24928: -- Thanks [~kgyrtkirk] [~mbod] [~pvary] for the reviews! > In case of non-native tables use basic statistics from HiveStorageHandler > - > > Key: HIVE-24928 > URL: https://issues.apache.org/jira/browse/HIVE-24928 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE > ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by > the BasicStatsTask class. This class tries to estimate the statistics by > scanning the directory of the table. > In the case of non-native tables (iceberg, hbase), the table directory might > contain metadata files as well, which would be counted by the BasicStatsTask > when calculating basic stats. > Instead of having this logic, the HiveStorageHandler implementation should > provide basic statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér resolved HIVE-24928. -- Resolution: Fixed > In case of non-native tables use basic statistics from HiveStorageHandler > - > > Key: HIVE-24928 > URL: https://issues.apache.org/jira/browse/HIVE-24928 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE > ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by > the BasicStatsTask class. This class tries to estimate the statistics by > scanning the directory of the table. > In the case of non-native tables (iceberg, hbase), the table directory might > contain metadata files as well, which would be counted by the BasicStatsTask > when calculating basic stats. > Instead of having this logic, the HiveStorageHandler implementation should > provide basic statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24928) In case of non-native tables use basic statistics from HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-24928?focusedWorklogId=584904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584904 ] ASF GitHub Bot logged work on HIVE-24928: - Author: ASF GitHub Bot Created on: 19/Apr/21 06:22 Start Date: 19/Apr/21 06:22 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #2111: URL: https://github.com/apache/hive/pull/2111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 584904) Time Spent: 6h 20m (was: 6h 10m) > In case of non-native tables use basic statistics from HiveStorageHandler > - > > Key: HIVE-24928 > URL: https://issues.apache.org/jira/browse/HIVE-24928 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > When we are running `ANALYZE TABLE ... COMPUTE STATISTICS` or `ANALYZE TABLE > ... COMPUTE STATISTICS FOR COLUMNS` all the basic statistics are collected by > the BasicStatsTask class. This class tries to estimate the statistics by > scanning the directory of the table. > In the case of non-native tables (iceberg, hbase), the table directory might > contain metadata files as well, which would be counted by the BasicStatsTask > when calculating basic stats. > Instead of having this logic, the HiveStorageHandler implementation should > provide basic statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)