[jira] [Work logged] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys
[ https://issues.apache.org/jira/browse/HIVE-25142?focusedWorklogId=599619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599619 ] ASF GitHub Bot logged work on HIVE-25142: - Author: ASF GitHub Bot Created on: 20/May/21 05:36 Start Date: 20/May/21 05:36 Worklog Time Spent: 10m Work Description: maheshk114 opened a new pull request #2300: URL: https://github.com/apache/hive/pull/2300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599619) Remaining Estimate: 0h Time Spent: 10m > Rehashing in map join fast hash table causing corruption for large keys > > > Key: HIVE-25142 > URL: https://issues.apache.org/jira/browse/HIVE-25142 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In map join the hash table is created using the keys. To support rehashing, > the keys are stored in write buffer. The hash table contains the offset of > the keys along with the hash code. When rehashing is done, the offset is > extracted from the hash table and then hash code is generated again. For > large keys of size greater than 255, the key length is also stored along with > the key. In case of fast hash table implementation the way key is extracted > is not proper. There is a code bug and thats causing the wrong key to be > extracted and causing wrong hash code generation. This is causing the > corruption in the hash table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys
[ https://issues.apache.org/jira/browse/HIVE-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25142: -- Labels: pull-request-available (was: ) > Rehashing in map join fast hash table causing corruption for large keys > > > Key: HIVE-25142 > URL: https://issues.apache.org/jira/browse/HIVE-25142 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In map join the hash table is created using the keys. To support rehashing, > the keys are stored in write buffer. The hash table contains the offset of > the keys along with the hash code. When rehashing is done, the offset is > extracted from the hash table and then hash code is generated again. For > large keys of size greater than 255, the key length is also stored along with > the key. In case of fast hash table implementation the way key is extracted > is not proper. There is a code bug and thats causing the wrong key to be > extracted and causing wrong hash code generation. This is causing the > corruption in the hash table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25142) Rehashing in map join fast hash table causing corruption for large keys
[ https://issues.apache.org/jira/browse/HIVE-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera reassigned HIVE-25142: -- > Rehashing in map join fast hash table causing corruption for large keys > > > Key: HIVE-25142 > URL: https://issues.apache.org/jira/browse/HIVE-25142 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > > In map join the hash table is created using the keys. To support rehashing, > the keys are stored in write buffer. The hash table contains the offset of > the keys along with the hash code. When rehashing is done, the offset is > extracted from the hash table and then hash code is generated again. For > large keys of size greater than 255, the key length is also stored along with > the key. In case of fast hash table implementation the way key is extracted > is not proper. There is a code bug and thats causing the wrong key to be > extracted and causing wrong hash code generation. This is causing the > corruption in the hash table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25136) Remove MetaExceptions From RawStore First Cut
[ https://issues.apache.org/jira/browse/HIVE-25136?focusedWorklogId=599576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599576 ] ASF GitHub Bot logged work on HIVE-25136: - Author: ASF GitHub Bot Created on: 20/May/21 02:26 Start Date: 20/May/21 02:26 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2290: URL: https://github.com/apache/hive/pull/2290#issuecomment-844634120 @miklosgergely This work regarding `MetaExceptions` is coming from my emails to dev@hive . This one is a bit more significant and removes many instances of this Thrift-generated `MetaExceptions` from the bowels of the Hive Metastore. Please let me know if you can take a look at this. Thanks! @nrg4878 too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599576) Time Spent: 20m (was: 10m) > Remove MetaExceptions From RawStore First Cut > - > > Key: HIVE-25136 > URL: https://issues.apache.org/jira/browse/HIVE-25136 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs
[ https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599575&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599575 ] ASF GitHub Bot logged work on HIVE-25127: - Author: ASF GitHub Bot Created on: 20/May/21 02:24 Start Date: 20/May/21 02:24 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2283: URL: https://github.com/apache/hive/pull/2283 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599575) Time Spent: 40m (was: 0.5h) > Remove Thrift Exceptions From RawStore getCatalogs > -- > > Key: HIVE-25127 > URL: https://issues.apache.org/jira/browse/HIVE-25127 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs
[ https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599573&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599573 ] ASF GitHub Bot logged work on HIVE-25127: - Author: ASF GitHub Bot Created on: 20/May/21 02:22 Start Date: 20/May/21 02:22 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #2283: URL: https://github.com/apache/hive/pull/2283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599573) Time Spent: 0.5h (was: 20m) > Remove Thrift Exceptions From RawStore getCatalogs > -- > > Key: HIVE-25127 > URL: https://issues.apache.org/jira/browse/HIVE-25127 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25141) Review Error Level Logging in HMS Module
[ https://issues.apache.org/jira/browse/HIVE-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25141: -- Labels: pull-request-available (was: ) > Review Error Level Logging in HMS Module > > > Key: HIVE-25141 > URL: https://issues.apache.org/jira/browse/HIVE-25141 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > * Remove "log *and* throw" (it should be one or the other > * Remove superfluous code > * Ensure the stack traces are being logged (and not just the Exception > message) to ease troubleshooting > * Remove double-printing the Exception message (SLF4J dictates that the > Exception message will be printed as part of the logger's formatting -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25141) Review Error Level Logging in HMS Module
[ https://issues.apache.org/jira/browse/HIVE-25141?focusedWorklogId=599431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599431 ] ASF GitHub Bot logged work on HIVE-25141: - Author: ASF GitHub Bot Created on: 19/May/21 20:08 Start Date: 19/May/21 20:08 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2299: URL: https://github.com/apache/hive/pull/2299 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599431) Remaining Estimate: 0h Time Spent: 10m > Review Error Level Logging in HMS Module > > > Key: HIVE-25141 > URL: https://issues.apache.org/jira/browse/HIVE-25141 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > * Remove "log *and* throw" (it should be one or the other > * Remove superfluous code > * Ensure the stack traces are being logged (and not just the Exception > message) to ease troubleshooting > * Remove double-printing the Exception message (SLF4J dictates that the > Exception message will be printed as part of the logger's formatting -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25141) Review Error Level Logging in HMS Module
[ https://issues.apache.org/jira/browse/HIVE-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-25141: - > Review Error Level Logging in HMS Module > > > Key: HIVE-25141 > URL: https://issues.apache.org/jira/browse/HIVE-25141 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > * Remove "log *and* throw" (it should be one or the other > * Remove superfluous code > * Ensure the stack traces are being logged (and not just the Exception > message) to ease troubleshooting > * Remove double-printing the Exception message (SLF4J dictates that the > Exception message will be printed as part of the logger's formatting -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread
[ https://issues.apache.org/jira/browse/HIVE-25112?focusedWorklogId=599384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599384 ] ASF GitHub Bot logged work on HIVE-25112: - Author: ASF GitHub Bot Created on: 19/May/21 18:10 Start Date: 19/May/21 18:10 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2270: URL: https://github.com/apache/hive/pull/2270#issuecomment-844347235 @klcopp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599384) Time Spent: 40m (was: 0.5h) > Simplify TXN Compactor Heartbeat Thread > --- > > Key: HIVE-25112 > URL: https://issues.apache.org/jira/browse/HIVE-25112 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Simplify the Thread structure. Threads do not need a "start"/"stop" state, > they already have it. It is running/interrupted and it is designed to work > this way with thread pools and forced exits. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread
[ https://issues.apache.org/jira/browse/HIVE-25112?focusedWorklogId=599383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599383 ] ASF GitHub Bot logged work on HIVE-25112: - Author: ASF GitHub Bot Created on: 19/May/21 18:08 Start Date: 19/May/21 18:08 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2270: URL: https://github.com/apache/hive/pull/2270#issuecomment-844345978 @miklosgergely :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599383) Time Spent: 0.5h (was: 20m) > Simplify TXN Compactor Heartbeat Thread > --- > > Key: HIVE-25112 > URL: https://issues.apache.org/jira/browse/HIVE-25112 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Simplify the Thread structure. Threads do not need a "start"/"stop" state, > they already have it. It is running/interrupted and it is designed to work > this way with thread pools and forced exits. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location
[ https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347763#comment-17347763 ] Zoltan Haindrich commented on HIVE-24920: - [~ngangam], [~thejas]: I've updated the PR - and implemented that TRANSLATED_TO_EXTERNAL tables may follow renames > TRANSLATED_TO_EXTERNAL tables may write to the same location > > > Key: HIVE-24920 > URL: https://issues.apache.org/jira/browse/HIVE-24920 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > create table t (a integer); > insert into t values(1); > alter table t rename to t2; > create table t (a integer); -- I expected an exception from this command > (location already exists) but because its an external table no exception > insert into t values(2); > select * from t; -- shows 1 and 2 > drop table t2;-- wipes out data location > select * from t; -- empty resultset > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24909) Skip the repl events from getting logged in notification log
[ https://issues.apache.org/jira/browse/HIVE-24909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347747#comment-17347747 ] Pravin Sinha commented on HIVE-24909: - Committed to master. Thanks for the patch, [~haymant] > Skip the repl events from getting logged in notification log > > > Key: HIVE-24909 > URL: https://issues.apache.org/jira/browse/HIVE-24909 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently REPL dump events are logged and replicated as a part of replication > policy. Whenever one replication cycle completed, we always have one > transaction left open on the target corresponding to repl dump operation. > This will never be caught up without manually dealing with the transaction on > target cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24909) Skip the repl events from getting logged in notification log
[ https://issues.apache.org/jira/browse/HIVE-24909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347745#comment-17347745 ] Pravin Sinha edited comment on HIVE-24909 at 5/19/21, 3:48 PM: --- +1 Committed to master. Thanks for the patch, [~haymant] was (Author: pkumarsinha): Thanks for the patch, [~haymant] > Skip the repl events from getting logged in notification log > > > Key: HIVE-24909 > URL: https://issues.apache.org/jira/browse/HIVE-24909 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently REPL dump events are logged and replicated as a part of replication > policy. Whenever one replication cycle completed, we always have one > transaction left open on the target corresponding to repl dump operation. > This will never be caught up without manually dealing with the transaction on > target cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24909) Skip the repl events from getting logged in notification log
[ https://issues.apache.org/jira/browse/HIVE-24909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347745#comment-17347745 ] Pravin Sinha edited comment on HIVE-24909 at 5/19/21, 3:48 PM: --- +1 was (Author: pkumarsinha): +1 Committed to master. Thanks for the patch, [~haymant] > Skip the repl events from getting logged in notification log > > > Key: HIVE-24909 > URL: https://issues.apache.org/jira/browse/HIVE-24909 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently REPL dump events are logged and replicated as a part of replication > policy. Whenever one replication cycle completed, we always have one > transaction left open on the target corresponding to repl dump operation. > This will never be caught up without manually dealing with the transaction on > target cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24909) Skip the repl events from getting logged in notification log
[ https://issues.apache.org/jira/browse/HIVE-24909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha resolved HIVE-24909. - Resolution: Fixed Thanks for the patch, [~haymant] > Skip the repl events from getting logged in notification log > > > Key: HIVE-24909 > URL: https://issues.apache.org/jira/browse/HIVE-24909 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently REPL dump events are logged and replicated as a part of replication > policy. Whenever one replication cycle completed, we always have one > transaction left open on the target corresponding to repl dump operation. > This will never be caught up without manually dealing with the transaction on > target cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24909) Skip the repl events from getting logged in notification log
[ https://issues.apache.org/jira/browse/HIVE-24909?focusedWorklogId=599315&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599315 ] ASF GitHub Bot logged work on HIVE-24909: - Author: ASF GitHub Bot Created on: 19/May/21 15:44 Start Date: 19/May/21 15:44 Worklog Time Spent: 10m Work Description: pkumarsinha merged pull request #2101: URL: https://github.com/apache/hive/pull/2101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599315) Time Spent: 7h 10m (was: 7h) > Skip the repl events from getting logged in notification log > > > Key: HIVE-24909 > URL: https://issues.apache.org/jira/browse/HIVE-24909 > Project: Hive > Issue Type: Bug >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently REPL dump events are logged and replicated as a part of replication > policy. Whenever one replication cycle completed, we always have one > transaction left open on the target corresponding to repl dump operation. > This will never be caught up without manually dealing with the transaction on > target cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled
[ https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-25140: Description: Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to Thrift and protobuf version conflicts. A logging only exporter is used. There are Spans for BeeLine and Hive. Server 2. The code was developed on branch-3.1 and porting Spans to the Hive MetaStore on master is taking more time due to major metastore code refactoring. was: Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to Thrift and protobuf version conflicts. Has Spans for BeeLine and Hive. Server 2. The code was developed on branch-3.1 and porting Spans to the Hive MetaStore on master is taking more time due to major code refactoring. > Hive Distributed Tracing -- Part 1: Disabled > > > Key: HIVE-25140 > URL: https://issues.apache.org/jira/browse/HIVE-25140 > Project: Hive > Issue Type: Sub-task >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > > Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to > Thrift and protobuf version conflicts. A logging only exporter is used. > There are Spans for BeeLine and Hive. Server 2. The code was developed on > branch-3.1 and porting Spans to the Hive MetaStore on master is taking more > time due to major metastore code refactoring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled
[ https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline reassigned HIVE-25140: --- > Hive Distributed Tracing -- Part 1: Disabled > > > Key: HIVE-25140 > URL: https://issues.apache.org/jira/browse/HIVE-25140 > Project: Hive > Issue Type: Sub-task >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > > Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to > Thrift and protobuf version conflicts. > Has Spans for BeeLine and Hive. Server 2. The code was developed on > branch-3.1 and porting Spans to the Hive MetaStore on master is taking more > time due to major code refactoring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs
[ https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599304 ] ASF GitHub Bot logged work on HIVE-25127: - Author: ASF GitHub Bot Created on: 19/May/21 15:24 Start Date: 19/May/21 15:24 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2283: URL: https://github.com/apache/hive/pull/2283 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599304) Time Spent: 20m (was: 10m) > Remove Thrift Exceptions From RawStore getCatalogs > -- > > Key: HIVE-25127 > URL: https://issues.apache.org/jira/browse/HIVE-25127 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs
[ https://issues.apache.org/jira/browse/HIVE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25127: -- Labels: pull-request-available (was: ) > Remove Thrift Exceptions From RawStore getCatalogs > -- > > Key: HIVE-25127 > URL: https://issues.apache.org/jira/browse/HIVE-25127 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25127) Remove Thrift Exceptions From RawStore getCatalogs
[ https://issues.apache.org/jira/browse/HIVE-25127?focusedWorklogId=599302&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599302 ] ASF GitHub Bot logged work on HIVE-25127: - Author: ASF GitHub Bot Created on: 19/May/21 15:22 Start Date: 19/May/21 15:22 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #2283: URL: https://github.com/apache/hive/pull/2283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599302) Remaining Estimate: 0h Time Spent: 10m > Remove Thrift Exceptions From RawStore getCatalogs > -- > > Key: HIVE-25127 > URL: https://issues.apache.org/jira/browse/HIVE-25127 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25069) Hive Distributed Tracing
[ https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-25069: Attachment: (was: HIVE-25069.01.patch) > Hive Distributed Tracing > > > Key: HIVE-25069 > URL: https://issues.apache.org/jira/browse/HIVE-25069 > Project: Hive > Issue Type: New Feature >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > Attachments: image-2021-05-10-09-20-54-688.png, > image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png > > > Instrument Hive code to gather distributed traces and export trace data to a > configurable collector. > Distributed tracing is a revolutionary tool for debugging issues. > We will use the new OpenTelemetry open-source standard that our industry has > aligned on. OpenTelemetry is the merger of two earlier distributed tracing > projects OpenTracing and OpenCensus. > Next step: Add design document that goes into more detail on the benefits of > distributed tracing and describes how Hive will enhanced. > Also see: > HBASE-22120 Replace HTrace with OpenTelemetry -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HIVE-25069) Hive Distributed Tracing
[ https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-25069: Comment: was deleted (was: A first Work-in-Progress patch. Work was done on branch-3.1 and manually merging changes to master is tedious. The Tracing infrastructure modules are in but only a few Hive classes have been merged. Enough though to give Hive QA a run. Tracing will exported to a logging-only exporter..) > Hive Distributed Tracing > > > Key: HIVE-25069 > URL: https://issues.apache.org/jira/browse/HIVE-25069 > Project: Hive > Issue Type: New Feature >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > Attachments: image-2021-05-10-09-20-54-688.png, > image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png > > > Instrument Hive code to gather distributed traces and export trace data to a > configurable collector. > Distributed tracing is a revolutionary tool for debugging issues. > We will use the new OpenTelemetry open-source standard that our industry has > aligned on. OpenTelemetry is the merger of two earlier distributed tracing > projects OpenTracing and OpenCensus. > Next step: Add design document that goes into more detail on the benefits of > distributed tracing and describes how Hive will enhanced. > Also see: > HBASE-22120 Replace HTrace with OpenTelemetry -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HIVE-25069) Hive Distributed Tracing
[ https://issues.apache.org/jira/browse/HIVE-25069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-25069: Comment: was deleted (was: I'll try a pull request instead.) > Hive Distributed Tracing > > > Key: HIVE-25069 > URL: https://issues.apache.org/jira/browse/HIVE-25069 > Project: Hive > Issue Type: New Feature >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > Attachments: image-2021-05-10-09-20-54-688.png, > image-2021-05-10-09-30-44-570.png, image-2021-05-10-19-06-02-679.png > > > Instrument Hive code to gather distributed traces and export trace data to a > configurable collector. > Distributed tracing is a revolutionary tool for debugging issues. > We will use the new OpenTelemetry open-source standard that our industry has > aligned on. OpenTelemetry is the merger of two earlier distributed tracing > projects OpenTracing and OpenCensus. > Next step: Add design document that goes into more detail on the benefits of > distributed tracing and describes how Hive will enhanced. > Also see: > HBASE-22120 Replace HTrace with OpenTelemetry -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog
[ https://issues.apache.org/jira/browse/HIVE-25128?focusedWorklogId=599301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599301 ] ASF GitHub Bot logged work on HIVE-25128: - Author: ASF GitHub Bot Created on: 19/May/21 15:17 Start Date: 19/May/21 15:17 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2291: URL: https://github.com/apache/hive/pull/2291 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599301) Time Spent: 40m (was: 0.5h) > Remove Thrift Exceptions From RawStore alterCatalog > --- > > Key: HIVE-25128 > URL: https://issues.apache.org/jira/browse/HIVE-25128 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25128) Remove Thrift Exceptions From RawStore alterCatalog
[ https://issues.apache.org/jira/browse/HIVE-25128?focusedWorklogId=599300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599300 ] ASF GitHub Bot logged work on HIVE-25128: - Author: ASF GitHub Bot Created on: 19/May/21 15:16 Start Date: 19/May/21 15:16 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #2291: URL: https://github.com/apache/hive/pull/2291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599300) Time Spent: 0.5h (was: 20m) > Remove Thrift Exceptions From RawStore alterCatalog > --- > > Key: HIVE-25128 > URL: https://issues.apache.org/jira/browse/HIVE-25128 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25010) Create TestIcebergCliDriver and TestIcebergNegativeCliDriver
[ https://issues.apache.org/jira/browse/HIVE-25010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347719#comment-17347719 ] Stamatis Zampetakis commented on HIVE-25010: I tried running the tests for this driver locally using the following command, which usually works fine with other CliDriver tests, but I got some weird failures as the one shown below. {code:sh} cd itests/qtest mvn test -Dtest=TestIcebergCliDriver -Dtest.output.overwrite {code} The exception that I got was the following: {noformat} org.apache.hive.iceberg.org.apache.iceberg.exceptions.NoSuchIcebergTableException {noformat} It turns out that in order to run the Iceberg tests the project needs to be compiled with the iceberg profile enabled. {code:sh} mvn clean install -DskipTests -Pitests,iceberg {code} Failing to include the iceberg profile in the compilation can lead to more problems since old versions of the iceberg module may be mixed with current compiled SNAPSHOTS and make the problem harder to debug. > Create TestIcebergCliDriver and TestIcebergNegativeCliDriver > > > Key: HIVE-25010 > URL: https://issues.apache.org/jira/browse/HIVE-25010 > Project: Hive > Issue Type: Test >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > We should create iceberg specific drivers to run iceberg qtests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location
[ https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347702#comment-17347702 ] Zoltan Haindrich commented on HIVE-24920: - if we are about to do that then an existing external table dir might cause some trouble: {code} create external table t (i integer); -- this will create dir WH/t insert into t values (1); drop table t; -- this will leave WH/t as is beacuse its a full external table without the purge option create table t(i integer); -- this will create a table at the same external location; which is now occupied...your current proposal doesn't handle this case select * from t 1 -- shows the inserted record from the previous table instance... {code} I don't think we should just accept the above behaviour the user have used a statement which should have created a normal managed table (create table t) - so it should be empty in any circumstancesif we ant to do the same kind of renames for translated table we should still retain the "existing location dir" avoidance mechanisms of the existing patch - and set the one which throws an exception if it exists the default. This could probably enable our users to choose the behaviour they would like to see. [~thejas]: what do you think? > TRANSLATED_TO_EXTERNAL tables may write to the same location > > > Key: HIVE-24920 > URL: https://issues.apache.org/jira/browse/HIVE-24920 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > create table t (a integer); > insert into t values(1); > alter table t rename to t2; > create table t (a integer); -- I expected an exception from this command > (location already exists) but because its an external table no exception > insert into t values(2); > select * from t; -- shows 1 and 2 > drop table t2;-- wipes out data location > select * from t; -- empty resultset > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25079) Create new metric about number of writes to tables with manually disabled compaction
[ https://issues.apache.org/jira/browse/HIVE-25079?focusedWorklogId=599244&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599244 ] ASF GitHub Bot logged work on HIVE-25079: - Author: ASF GitHub Bot Created on: 19/May/21 13:45 Start Date: 19/May/21 13:45 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2281: URL: https://github.com/apache/hive/pull/2281#discussion_r635252137 ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java ## @@ -687,6 +689,26 @@ static boolean equivalent(Map lhs, Map rhs) { return value.isEmpty()? Collections.emptyMap() : Splitter.on(',').withKeyValueSeparator("->").split(value); } + @Test + public void textWritesToDisabledCompactionTable() throws Exception { Review comment: nit: typo "text" -> "test" ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCompactionMetrics.java ## @@ -687,6 +689,26 @@ static boolean equivalent(Map lhs, Map rhs) { return value.isEmpty()? Collections.emptyMap() : Splitter.on(',').withKeyValueSeparator("->").split(value); } + @Test + public void textWritesToDisabledCompactionTable() throws Exception { +MetastoreConf.setVar(conf, MetastoreConf.ConfVars.TRANSACTIONAL_EVENT_LISTENERS, "org.apache.hadoop.hive.metastore.HMSMetricsListener"); Review comment: nit: HMSMetricsListener.class.getName() would be nicer ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSMetricsListener.java ## @@ -86,4 +92,24 @@ public void onAddPartition(AddPartitionEvent partitionEvent) throws MetaExceptio Metrics.getOrCreateGauge(MetricsConstants.TOTAL_PARTITIONS).incrementAndGet(); createdParts.inc(); } + + @Override + public void onAllocWriteId(AllocWriteIdEvent allocWriteIdEvent, Connection dbConn, SQLGenerator sqlGenerator) throws MetaException { +Table table = getTable(allocWriteIdEvent); Review comment: Before doing anything we should first check if metrics are enabled -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599244) Time Spent: 0.5h (was: 20m) > Create new metric about number of writes to tables with manually disabled > compaction > > > Key: HIVE-25079 > URL: https://issues.apache.org/jira/browse/HIVE-25079 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Create a new metric that measures the number of writes tables that has > compaction turned off manually. It does not matter if the write is committed > or aborted (both are bad...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook
[ https://issues.apache.org/jira/browse/HIVE-25139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25139: -- Labels: pull-request-available (was: ) > Filter out null table properties in HiveIcebergMetaHook > --- > > Key: HIVE-25139 > URL: https://issues.apache.org/jira/browse/HIVE-25139 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook
[ https://issues.apache.org/jira/browse/HIVE-25139?focusedWorklogId=599237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599237 ] ASF GitHub Bot logged work on HIVE-25139: - Author: ASF GitHub Bot Created on: 19/May/21 13:36 Start Date: 19/May/21 13:36 Worklog Time Spent: 10m Work Description: lcspinter opened a new pull request #2298: URL: https://github.com/apache/hive/pull/2298 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599237) Remaining Estimate: 0h Time Spent: 10m > Filter out null table properties in HiveIcebergMetaHook > --- > > Key: HIVE-25139 > URL: https://issues.apache.org/jira/browse/HIVE-25139 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24665) Add commitAlterTable method to the HiveMetaHook interface
[ https://issues.apache.org/jira/browse/HIVE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér resolved HIVE-24665. -- Resolution: Duplicate > Add commitAlterTable method to the HiveMetaHook interface > - > > Key: HIVE-24665 > URL: https://issues.apache.org/jira/browse/HIVE-24665 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: László Pintér >Priority: Major > > Currently we have pre and post hooks for create table and drop table > commands, but only a pre hook for alter table commands. We should add a post > hook as well (with a default implementation). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25139) Filter out null table properties in HiveIcebergMetaHook
[ https://issues.apache.org/jira/browse/HIVE-25139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér reassigned HIVE-25139: > Filter out null table properties in HiveIcebergMetaHook > --- > > Key: HIVE-25139 > URL: https://issues.apache.org/jira/browse/HIVE-25139 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25080) Create metric about oldest entry in "ready for cleaning" state
[ https://issues.apache.org/jira/browse/HIVE-25080?focusedWorklogId=599211&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599211 ] ASF GitHub Bot logged work on HIVE-25080: - Author: ASF GitHub Bot Created on: 19/May/21 13:08 Start Date: 19/May/21 13:08 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2297: URL: https://github.com/apache/hive/pull/2297#discussion_r635220587 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -292,7 +292,10 @@ TxnStatus.OPEN + "' AND \"TXN_TYPE\" != "+ TxnType.REPL_CREATED.getValue() +") \"T\" CROSS JOIN (" + "SELECT COUNT(*), MIN(\"TXN_ID\"), ({0} - MIN(\"TXN_STARTED\"))/1000 FROM \"TXNS\" WHERE \"TXN_STATE\"='" + TxnStatus.ABORTED + "') \"A\" CROSS JOIN (" + - "SELECT COUNT(*), ({0} - MIN(\"HL_ACQUIRED_AT\"))/1000 FROM \"HIVE_LOCKS\") \"HL\""; + "SELECT COUNT(*), ({0} - MIN(\"HL_ACQUIRED_AT\"))/1000 FROM \"HIVE_LOCKS\") \"HL\" CROSS JOIN (" + + "SELECT ({0} - MIN(\"CQ_ENQUEUE_TIME\"))/1000 from \"COMPACTION_QUEUE\" WHERE " + Review comment: I think CQ_ENQUEUE_TIME is the time that the compaction was put in "initiated" state. Either CQ_ENQUEUE_TIME value should be updated when the compaction is put in "ready for cleaning" or we need a new column in the compaction queue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599211) Time Spent: 20m (was: 10m) > Create metric about oldest entry in "ready for cleaning" state > -- > > Key: HIVE-25080 > URL: https://issues.apache.org/jira/browse/HIVE-25080 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When a compaction txn commits, COMPACTION_QUEUE.CQ_COMMIT_TIME is updated > with the current time. Then the compaction state is set to "ready for > cleaning". (... and then the Cleaner runs and the state is set to "succeeded" > hopefully) > Based on this we know (roughly) how long a compaction has been in state > "ready for cleaning". > We should create a metric similar to compaction_oldest_enqueue_age_in_sec > that would show that the cleaner is blocked by something i.e. find the > compaction in "ready for cleaning" that has the oldest commit time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25109) CBO fails when updating table has constraints defined
[ https://issues.apache.org/jira/browse/HIVE-25109?focusedWorklogId=599190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599190 ] ASF GitHub Bot logged work on HIVE-25109: - Author: ASF GitHub Bot Created on: 19/May/21 12:41 Start Date: 19/May/21 12:41 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2268: URL: https://github.com/apache/hive/pull/2268#discussion_r635199267 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -5031,7 +5038,7 @@ private RelNode genLogicalPlan(QB qb, boolean outerMostQB, // Build Rel for Constraint checks Pair constraintPair = - genConstraintFilterLogicalPlan(qb, srcRel, outerNameToPosMap, outerRR); + genConstraintFilterLogicalPlan(qb, selPair, outerNameToPosMap, outerRR); Review comment: Went through the code where `selectRel` gets its value and I found that it can not be null: If is coming from `internalGenSelectLogicalPlan` which can create it with the following way ``` outputRel = genUDTFPlan(genericUDTF, genericUDTFName, udtfTableAlias, udtfColAliases, qb, ... RelNode udtf = HiveTableFunctionScan.create(cluster, traitSet, list, rexNode, null, retType, null); outputRel = genSelectRelNode(columnList, outputRR, srcRel); ... HiveRelNode selRel = HiveProject.create( outputRel = new HiveAggregate(cluster, cluster.traitSetOf(HiveRelNode.CONVENTION), ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599190) Time Spent: 40m (was: 0.5h) > CBO fails when updating table has constraints defined > - > > Key: HIVE-25109 > URL: https://issues.apache.org/jira/browse/HIVE-25109 > Project: Hive > Issue Type: Bug > Components: CBO, Logical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > create table acid_uami_n0(i int, > de decimal(5,2) constraint nn1 not null enforced, > vc varchar(128) constraint ch2 CHECK (de >= cast(i as > decimal(5,2))) enforced) > clustered by (i) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true'); > -- update > explain cbo > update acid_uami_n0 set de = 893.14 where de = 103.00; > {code} > hive.log > {code} > 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] > parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result > Schema didn't match Optimized Op Tree Schema > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.UpdateDeleteSe
[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599188&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599188 ] ASF GitHub Bot logged work on HIVE-25034: - Author: ASF GitHub Bot Created on: 19/May/21 12:35 Start Date: 19/May/21 12:35 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2243: URL: https://github.com/apache/hive/pull/2243#discussion_r635195134 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -7759,6 +7759,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) .getMsg(destinationPath.toUri().toString())); } } + // handle direct insert CTAS case + // for direct insert CTAS, the table creation DDL is not added to the task plan in TaskCompiler, + // therefore we need to add the InsertHook here manually so that HiveMetaHook#commitInsertTable is called Review comment: Rephrased the comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599188) Time Spent: 4h 20m (was: 4h 10m) > Implement CTAS for Iceberg > -- > > Key: HIVE-25034 > URL: https://issues.apache.org/jira/browse/HIVE-25034 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25093) date_format() UDF is returning values in UTC time zone only
[ https://issues.apache.org/jira/browse/HIVE-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25093: - Description: *HIVE - 1.2* sshuser@hn0-dateti:~$ *timedatectl* Local time: Thu 2021-05-06 11:56:08 IST Universal time: Thu 2021-05-06 06:26:08 UTC RTC time: Thu 2021-05-06 06:26:08 Time zone: Asia/Kolkata (IST, +0530) Network time on: yes NTP synchronized: yes RTC in local TZ: no sshuser@hn0-dateti:~$ beeline 0: jdbc:hive2://localhost:10001/default> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+--+ | _c0 | +--+--+ | 2021-05-06 11:58:53.760 IST | +--+--+ 1 row selected (1.271 seconds) *HIVE - 3.1.0* sshuser@hn0-testja:~$ *timedatectl* Local time: Thu 2021-05-06 12:03:32 IST Universal time: Thu 2021-05-06 06:33:32 UTC RTC time: Thu 2021-05-06 06:33:32 Time zone: Asia/Kolkata (IST, +0530) Network time on: yes NTP synchronized: yes RTC in local TZ: no sshuser@hn0-testja:~$ beeline 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+ | _c0 | +--+ | *2021-05-06 06:33:59.078 UTC* | +--+ 1 row selected (13.396 seconds) 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set hive.local.time.zone=Asia/Kolkata;* No rows affected (0.025 seconds) 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+ | _c0 | +--+ | *{color:red}2021-05-06 12:08:15.118 UTC{color}* | +--+ 1 row selected (1.074 seconds) expected result was *2021-05-06 12:08:15.118 IST* As part of HIVE-12192 it was decided to have a common time zone for all computation i.e. "UTC". Due to which data_format() function was hard coded to "UTC". But later in HIVE-21039 it was decided that user session time zone value should be the default not UTC. date_format() was not fixed as part of HIVE-21039. what should be the ideal time zone value of date_format(). was: *HIVE - 1.2* sshuser@hn0-dateti:~$ *timedatectl* Local time: Thu 2021-05-06 11:56:08 IST Universal time: Thu 2021-05-06 06:26:08 UTC RTC time: Thu 2021-05-06 06:26:08 Time zone: Asia/Kolkata (IST, +0530) Network time on: yes NTP synchronized: yes RTC in local TZ: no sshuser@hn0-dateti:~$ beeline 0: jdbc:hive2://localhost:10001/default> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+--+ | _c0 | +--+--+ | 2021-05-06 11:58:53.760 IST | +--+--+ 1 row selected (1.271 seconds) *HIVE - 3.1.0* sshuser@hn0-testja:~$ *timedatectl* Local time: Thu 2021-05-06 12:03:32 IST Universal time: Thu 2021-05-06 06:33:32 UTC RTC time: Thu 2021-05-06 06:33:32 Time zone: Asia/Kolkata (IST, +0530) Network time on: yes NTP synchronized: yes RTC in local TZ: no sshuser@hn0-testja:~$ beeline 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(2021-05-06 12:03:32,"-MM-dd HH:mm:ss.SSS z");* +--+ | _c0 | +--+ | *2021-05-06 06:33:59.078 UTC* | +--+ 1 row selected (13.396 seconds) 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set hive.local.time.zone=Asia/Kolkata;* No rows affected (0.025 seconds) 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+ | _c0 | +--+ | *{color:red}2021-05-06 12:08:15.118 UTC{color}* | +--+ 1 row selected (1.074 seconds) expected result was *2021-05-06 12:08:15.118 IST* As part of HIVE-12192 it was decided to have a common time zone for all computation i.e. "UTC". Due to which data_format() function was hard coded to "UTC". But later in HIVE-21039 it was decided that user session time zone value should be the default not UTC. date_format() was not fixed as part of HIVE-21039. what should be the ideal time zone value of date_format(). > date_format() UDF is returning values in UTC time zone only > > > Key: HIVE-25093 > URL: https://issues.apache.org/jira/browse/HIVE-25093 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.2 >Reporter: Ashish Sharma >Assignee: Ashish Sharma >
[jira] [Work logged] (HIVE-25109) CBO fails when updating table has constraints defined
[ https://issues.apache.org/jira/browse/HIVE-25109?focusedWorklogId=599186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599186 ] ASF GitHub Bot logged work on HIVE-25109: - Author: ASF GitHub Bot Created on: 19/May/21 12:29 Start Date: 19/May/21 12:29 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2268: URL: https://github.com/apache/hive/pull/2268#discussion_r635190684 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -3475,15 +3475,22 @@ private RelNode genFilterLogicalPlan(QB qb, RelNode srcRel, ImmutableMap(constraintRel, inputRR); + RelNode constraintRel = genFilterRelNode(constraintUDF, selPair.left, outerNameToPosMap, outerRR); + + List originalInputRefs = toRexNodeList(selPair.left); + List selectedRefs = Lists.newArrayList(); + for (int index = 0; index < selPair.right.getColumnInfos().size(); index++) { +selectedRefs.add(originalInputRefs.get(index)); + } Review comment: The Project may contains columns which are not in the top Project and not present in the row schema. However these columns may referenced in constraint filter expressions or sort and order by keys. I found that at the end of Project generation all columns coming from the input RowResolver of the Project are added to the output RowResolver: https://github.com/apache/hive/blob/d0d3f0aa50fa7b50ec74cae0dda0b93271799313/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L4738 Since these are added to the end of the list the selected ones should be a prefix of the full list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599186) Time Spent: 0.5h (was: 20m) > CBO fails when updating table has constraints defined > - > > Key: HIVE-25109 > URL: https://issues.apache.org/jira/browse/HIVE-25109 > Project: Hive > Issue Type: Bug > Components: CBO, Logical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > create table acid_uami_n0(i int, > de decimal(5,2) constraint nn1 not null enforced, > vc varchar(128) constraint ch2 CHECK (de >= cast(i as > decimal(5,2))) enforced) > clustered by (i) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true'); > -- update > explain cbo > update acid_uami_n0 set de = 893.14 where de = 103.00; > {code} > hive.log > {code} > 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] > parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result > Schema didn't match Optimized Op Tree Schema > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDelet
[jira] [Updated] (HIVE-25093) date_format() UDF is returning values in UTC time zone only
[ https://issues.apache.org/jira/browse/HIVE-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated HIVE-25093: - Description: *HIVE - 1.2* sshuser@hn0-dateti:~$ *timedatectl* Local time: Thu 2021-05-06 11:56:08 IST Universal time: Thu 2021-05-06 06:26:08 UTC RTC time: Thu 2021-05-06 06:26:08 Time zone: Asia/Kolkata (IST, +0530) Network time on: yes NTP synchronized: yes RTC in local TZ: no sshuser@hn0-dateti:~$ beeline 0: jdbc:hive2://localhost:10001/default> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+--+ | _c0 | +--+--+ | 2021-05-06 11:58:53.760 IST | +--+--+ 1 row selected (1.271 seconds) *HIVE - 3.1.0* sshuser@hn0-testja:~$ *timedatectl* Local time: Thu 2021-05-06 12:03:32 IST Universal time: Thu 2021-05-06 06:33:32 UTC RTC time: Thu 2021-05-06 06:33:32 Time zone: Asia/Kolkata (IST, +0530) Network time on: yes NTP synchronized: yes RTC in local TZ: no sshuser@hn0-testja:~$ beeline 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(2021-05-06 12:03:32,"-MM-dd HH:mm:ss.SSS z");* +--+ | _c0 | +--+ | *2021-05-06 06:33:59.078 UTC* | +--+ 1 row selected (13.396 seconds) 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set hive.local.time.zone=Asia/Kolkata;* No rows affected (0.025 seconds) 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+ | _c0 | +--+ | *{color:red}2021-05-06 12:08:15.118 UTC{color}* | +--+ 1 row selected (1.074 seconds) expected result was *2021-05-06 12:08:15.118 IST* As part of HIVE-12192 it was decided to have a common time zone for all computation i.e. "UTC". Due to which data_format() function was hard coded to "UTC". But later in HIVE-21039 it was decided that user session time zone value should be the default not UTC. date_format() was not fixed as part of HIVE-21039. what should be the ideal time zone value of date_format(). was: *HIVE - 1.2* sshuser@hn0-dateti:~$ *timedatectl* Local time: Thu 2021-05-06 11:56:08 IST Universal time: Thu 2021-05-06 06:26:08 UTC RTC time: Thu 2021-05-06 06:26:08 Time zone: Asia/Kolkata (IST, +0530) Network time on: yes NTP synchronized: yes RTC in local TZ: no sshuser@hn0-dateti:~$ beeline 0: jdbc:hive2://localhost:10001/default> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+--+ | _c0 | +--+--+ | 2021-05-06 11:58:53.760 IST | +--+--+ 1 row selected (1.271 seconds) *HIVE - 3.1.0* sshuser@hn0-testja:~$ *timedatectl* Local time: Thu 2021-05-06 12:03:32 IST Universal time: Thu 2021-05-06 06:33:32 UTC RTC time: Thu 2021-05-06 06:33:32 Time zone: Asia/Kolkata (IST, +0530) Network time on: yes NTP synchronized: yes RTC in local TZ: no sshuser@hn0-testja:~$ beeline 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+ | _c0 | +--+ | *2021-05-06 06:33:59.078 UTC* | +--+ 1 row selected (13.396 seconds) 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *set hive.local.time.zone=Asia/Kolkata;* No rows affected (0.025 seconds) 0: jdbc:hive2://zk0-testja.e0mrrixnyxde5h1suy> *select date_format(current_timestamp,"-MM-dd HH:mm:ss.SSS z");* +--+ | _c0 | +--+ | *{color:red}2021-05-06 12:08:15.118 UTC{color}* | +--+ 1 row selected (1.074 seconds) expected result was *2021-05-06 12:08:15.118 IST* As part of HIVE-12192 it was decided to have a common time zone for all computation i.e. "UTC". Due to which data_format() function was hard coded to "UTC". But later in HIVE-21039 it was decided that user session time zone value should be the default not UTC. date_format() was not fixed as part of HIVE-21039. what should be the ideal time zone value of date_format(). > date_format() UDF is returning values in UTC time zone only > > > Key: HIVE-25093 > URL: https://issues.apache.org/jira/browse/HIVE-25093 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 3.1.2 >Reporter: Ashish Sharma >Assignee: Ashish Sharma >
[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599170 ] ASF GitHub Bot logged work on HIVE-25034: - Author: ASF GitHub Bot Created on: 19/May/21 12:02 Start Date: 19/May/21 12:02 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2243: URL: https://github.com/apache/hive/pull/2243#discussion_r635171207 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java ## @@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws IOException { HiveIcebergTestUtils.validateData(table, expected, 0); } + @Test + public void testCTASFromHiveTable() { +Assume.assumeTrue("CTAS target table is supported only for HiveCatalog tables", +testTableType == TestTables.TestTableType.HIVE_CATALOG); Review comment: As discussed, since the `Catalogs.createTable()` does not register the table in HMS for non-HiveCatalogs, any subsequent SELECTS for the target table wouldn't work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599170) Time Spent: 4h 10m (was: 4h) > Implement CTAS for Iceberg > -- > > Key: HIVE-25034 > URL: https://issues.apache.org/jira/browse/HIVE-25034 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599164 ] ASF GitHub Bot logged work on HIVE-25034: - Author: ASF GitHub Bot Created on: 19/May/21 11:55 Start Date: 19/May/21 11:55 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2243: URL: https://github.com/apache/hive/pull/2243#discussion_r635167120 ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergCTASHook.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.Properties; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook; +import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext; +import org.apache.iceberg.mr.Catalogs; +import org.apache.iceberg.mr.InputFormatConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HiveIcebergCTASHook implements QueryLifeTimeHook { + + private static final Logger LOG = LoggerFactory.getLogger(HiveIcebergCTASHook.class); + + @Override + public void beforeCompile(QueryLifeTimeHookContext ctx) { + + } + + @Override + public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) { +if (hasError) { + checkAndRollbackIcebergCTAS(ctx); +} + } + + @Override + public void beforeExecution(QueryLifeTimeHookContext ctx) { + + } + + @Override + public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) { +if (hasError) { + checkAndRollbackIcebergCTAS(ctx); +} + } + + private void checkAndRollbackIcebergCTAS(QueryLifeTimeHookContext ctx) { +HiveConf conf = ctx.getHiveConf(); +String queryId = conf.getVar(HiveConf.ConfVars.HIVEQUERYID); +if (conf.getBoolean(String.format(InputFormatConfig.IS_CTAS_QUERY_TEMPLATE, queryId), false)) { + try { +String tableName = conf.get(String.format(InputFormatConfig.CTAS_TABLE_NAME_TEMPLATE, queryId)); +LOG.info("Dropping the following CTAS target table as part of rollback: {}", tableName); +Properties props = new Properties(); +props.put(Catalogs.NAME, tableName); +Catalogs.dropTable(conf, props); Review comment: Good point. As discussed, the table properties of the target table should contain the catalog_name (and the corresponding fields such as type), so we should drop the table from the correct catalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599164) Time Spent: 4h (was: 3h 50m) > Implement CTAS for Iceberg > -- > > Key: HIVE-25034 > URL: https://issues.apache.org/jira/browse/HIVE-25034 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599163&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599163 ] ASF GitHub Bot logged work on HIVE-25034: - Author: ASF GitHub Bot Created on: 19/May/21 11:54 Start Date: 19/May/21 11:54 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2243: URL: https://github.com/apache/hive/pull/2243#discussion_r635166430 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java ## @@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws IOException { HiveIcebergTestUtils.validateData(table, expected, 0); } + @Test + public void testCTASFromHiveTable() { +Assume.assumeTrue("CTAS target table is supported only for HiveCatalog tables", +testTableType == TestTables.TestTableType.HIVE_CATALOG); + +shell.executeStatement("CREATE TABLE source (id bigint, name string) PARTITIONED BY (dept string) STORED AS ORC"); +shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 'Linda', 'Finance')"); + +shell.executeStatement(String.format( +"CREATE TABLE target STORED BY '%s' %s TBLPROPERTIES ('%s'='%s') AS SELECT * FROM source", +HiveIcebergStorageHandler.class.getName(), +testTables.locationForCreateTableSQL(TableIdentifier.of("default", "target")), +TableProperties.DEFAULT_FILE_FORMAT, fileFormat)); + +List objects = shell.executeStatement("SELECT * FROM target ORDER BY id"); +Assert.assertEquals(2, objects.size()); +Assert.assertArrayEquals(new Object[]{1L, "Mike", "HR"}, objects.get(0)); +Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, objects.get(1)); + } + + @Test + public void testCTASFromDifferentIcebergCatalog() { +Assume.assumeTrue("CTAS target table is supported only for HiveCatalog tables", +testTableType == TestTables.TestTableType.HIVE_CATALOG); + +// get source data from a different catalog +shell.executeStatement(String.format( +"CREATE TABLE source STORED BY '%s' LOCATION '%s' TBLPROPERTIES ('%s'='%s', '%s'='%s')", +HiveIcebergStorageHandler.class.getName(), +temp.getRoot().getPath() + "/default/source/", +InputFormatConfig.CATALOG_NAME, Catalogs.ICEBERG_HADOOP_TABLE_NAME, +InputFormatConfig.TABLE_SCHEMA, SchemaParser.toJson(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA))); +shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'Roger'), (2, 'Linda', 'Albright')"); + +// CTAS into a new HiveCatalog table +shell.executeStatement(String.format( +"CREATE TABLE target STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS SELECT * FROM source", +HiveIcebergStorageHandler.class.getName(), +TableProperties.DEFAULT_FILE_FORMAT, fileFormat)); + +List objects = shell.executeStatement("SELECT * FROM target ORDER BY customer_id"); +Assert.assertEquals(2, objects.size()); +Assert.assertArrayEquals(new Object[]{1L, "Mike", "Roger"}, objects.get(0)); +Assert.assertArrayEquals(new Object[]{2L, "Linda", "Albright"}, objects.get(1)); + } + + @Test + public void testCTASFailureRollback() throws IOException { Review comment: Sure! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599163) Time Spent: 3h 50m (was: 3h 40m) > Implement CTAS for Iceberg > -- > > Key: HIVE-25034 > URL: https://issues.apache.org/jira/browse/HIVE-25034 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25109) CBO fails when updating table has constraints defined
[ https://issues.apache.org/jira/browse/HIVE-25109?focusedWorklogId=599138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599138 ] ASF GitHub Bot logged work on HIVE-25109: - Author: ASF GitHub Bot Created on: 19/May/21 10:56 Start Date: 19/May/21 10:56 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2268: URL: https://github.com/apache/hive/pull/2268#discussion_r635129376 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -5031,7 +5038,7 @@ private RelNode genLogicalPlan(QB qb, boolean outerMostQB, // Build Rel for Constraint checks Pair constraintPair = - genConstraintFilterLogicalPlan(qb, srcRel, outerNameToPosMap, outerRR); + genConstraintFilterLogicalPlan(qb, selPair, outerNameToPosMap, outerRR); Review comment: will this work okay when `selectRel == null`? previous code was passing `srcRel` which is optionally the previous `srcRel` ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -3475,15 +3475,22 @@ private RelNode genFilterLogicalPlan(QB qb, RelNode srcRel, ImmutableMap(constraintRel, inputRR); + RelNode constraintRel = genFilterRelNode(constraintUDF, selPair.left, outerNameToPosMap, outerRR); + + List originalInputRefs = toRexNodeList(selPair.left); + List selectedRefs = Lists.newArrayList(); + for (int index = 0; index < selPair.right.getColumnInfos().size(); index++) { +selectedRefs.add(originalInputRefs.get(index)); + } Review comment: I'm not sure about this; this block could be replaced with something like ``` selectedRefs.addAll(originalInputRefs.sublist(selPair.right.getColumnInfos().size())) ``` which looks odd to me because it would mean that the `selected` ones may only be a prefix of the original ones - is that true in every case? shouldn't this code be checking the `ref` of the `RexInputRefs` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599138) Time Spent: 20m (was: 10m) > CBO fails when updating table has constraints defined > - > > Key: HIVE-25109 > URL: https://issues.apache.org/jira/browse/HIVE-25109 > Project: Hive > Issue Type: Bug > Components: CBO, Logical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code} > create table acid_uami_n0(i int, > de decimal(5,2) constraint nn1 not null enforced, > vc varchar(128) constraint ch2 CHECK (de >= cast(i as > decimal(5,2))) enforced) > clustered by (i) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true'); > -- update > explain cbo > update acid_uami_n0 set de = 893.14 where de = 103.00; > {code} > hive.log > {code} > 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] > parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result > Schema didn't match Optimized Op Tree Schema > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInte
[jira] [Work logged] (HIVE-25117) Vector PTF ClassCastException with Decimal64
[ https://issues.apache.org/jira/browse/HIVE-25117?focusedWorklogId=599123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599123 ] ASF GitHub Bot logged work on HIVE-25117: - Author: ASF GitHub Bot Created on: 19/May/21 10:28 Start Date: 19/May/21 10:28 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2286: URL: https://github.com/apache/hive/pull/2286#discussion_r635106215 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java ## @@ -4962,9 +4969,8 @@ private static void createVectorPTFDesc(Operator ptfOp, evaluatorWindowFrameDefs, evaluatorInputExprNodeDescLists); -TypeInfo[] reducerBatchTypeInfos = vContext.getAllTypeInfos(); - vectorPTFDesc.setReducerBatchTypeInfos(reducerBatchTypeInfos); + vectorPTFDesc.setReducerBatchDataTypePhysicalVariations(reducerBatchDataTypePhysicalVariations); Review comment: Shall we have setReducerBatchTypeInfos(Types, TypeVariations) instead of creating a separate method? Seems like a good practice making sure we pas TypeVariations along with Types. ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/VectorPTFDesc.java ## @@ -487,10 +495,18 @@ public void setOutputColumnNames(String[] outputColumnNames) { return outputTypeInfos; } + public DataTypePhysicalVariation[] getOutputDataTypePhysicalVariations() { +return outputDataTypePhysicalVariations; + } + public void setOutputTypeInfos(TypeInfo[] outputTypeInfos) { this.outputTypeInfos = outputTypeInfos; } + public void setOutputDataTypePhysicalVariations(DataTypePhysicalVariation[] outputDataTypePhysicalVariations) { Review comment: As per comment above this can be simplified ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java ## @@ -4978,6 +4984,7 @@ private static void createVectorPTFDesc(Operator ptfOp, vectorPTFDesc.setOutputColumnNames(outputColumnNames); vectorPTFDesc.setOutputTypeInfos(outputTypeInfos); + vectorPTFDesc.setOutputDataTypePhysicalVariations(outputDataTypePhysicalVariations); Review comment: Same comment as above for outputTypes ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFOperator.java ## @@ -250,13 +253,16 @@ protected VectorizedRowBatch setupOverflowBatch() throws HiveException { for (int i = 0; i < outputProjectionColumnMap.length; i++) { int outputColumn = outputProjectionColumnMap[i]; String typeName = outputTypeInfos[i].getTypeName(); - allocateOverflowBatchColumnVector(overflowBatch, outputColumn, typeName); + allocateOverflowBatchColumnVector(overflowBatch, outputColumn, typeName, outputDataTypePhysicalVariations[i]); } // Now, add any scratch columns needed for children operators. int outputColumn = initialColumnCount; +DataTypePhysicalVariation[] dataTypePhysicalVariations = vOutContext.getScratchDataTypePhysicalVariations(); for (String typeName : vOutContext.getScratchColumnTypeNames()) { - allocateOverflowBatchColumnVector(overflowBatch, outputColumn++, typeName); + allocateOverflowBatchColumnVector(overflowBatch, outputColumn, typeName, + dataTypePhysicalVariations[outputColumn-initialColumnCount]); Review comment: I would expect the ScratchCol Types not to include the initial outputCols Types. Why do we need the outputColumn-initialColumnCount here? ## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/VectorPTFDesc.java ## @@ -419,6 +422,11 @@ public void setReducerBatchTypeInfos(TypeInfo[] reducerBatchTypeInfos) { this.reducerBatchTypeInfos = reducerBatchTypeInfos; } + public void setReducerBatchDataTypePhysicalVariations( Review comment: As per comment above this can be simplified -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599123) Time Spent: 20m (was: 10m) > Vector PTF ClassCastException with Decimal64 > > > Key: HIVE-25117 > URL: https://issues.apache.org/jira/browse/HIVE-25117 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > Attachments: vector_ptf_classcast_exception.q > > Time Spent: 20m > Remaining Estimate: 0h > > Only reproduces when there is at least
[jira] [Assigned] (HIVE-25138) Auto disable scheduled queries after repeated failures
[ https://issues.apache.org/jira/browse/HIVE-25138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25138: --- > Auto disable scheduled queries after repeated failures > -- > > Key: HIVE-25138 > URL: https://issues.apache.org/jira/browse/HIVE-25138 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24936) Fix file name parsing and copy file move.
[ https://issues.apache.org/jira/browse/HIVE-24936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish JP updated HIVE-24936: - Description: The taskId and taskAttemptId is not extracted correctly for copy files (1_02_copy_3) and when doing a move file of an incompatible copy file the rename utility generates wrong file names. Ex: 1_02_copy_3 is renamed to 1_02_copy_3_1 if 1_02_copy_3 already exists, ideally it should be 1_02_copy_N. Incompatible files should be always renamed using the current task or it can get deleted if the file name conflicts with another task output file. Ex: if the input file name for a task is 5_01 and is incompatible then if we move this file, it will be treated as an output file for task id 5, attempt 1 which if exists will try to generate the same file and fail and another attempt will be made. There will be 2 files 5_01, 5_02, the deduping code will remove 5_01 resulting in data loss. There are other scenarios where the same can happen. was:The taskId and taskAttemptId is not extracted correctly for copy files (1_02_copy_3) and when doing a move file of an incompatible copy file the rename utility generates wrong file names. Ex: 1_02_copy_3 is renamed to 1_02_copy_3_1 if 1_02_copy_3 already exists, ideally it should be 1_02_copy_N. > Fix file name parsing and copy file move. > - > > Key: HIVE-24936 > URL: https://issues.apache.org/jira/browse/HIVE-24936 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Harish JP >Assignee: Harish JP >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The taskId and taskAttemptId is not extracted correctly for copy files > (1_02_copy_3) and when doing a move file of an incompatible copy file the > rename utility generates wrong file names. Ex: 1_02_copy_3 is renamed to > 1_02_copy_3_1 if 1_02_copy_3 already exists, ideally it should be > 1_02_copy_N. > > Incompatible files should be always renamed using the current task or it can > get deleted if the file name conflicts with another task output file. Ex: if > the input file name for a task is 5_01 and is incompatible then if we > move this file, it will be treated as an output file for task id 5, attempt 1 > which if exists will try to generate the same file and fail and another > attempt will be made. There will be 2 files 5_01, 5_02, the deduping > code will remove 5_01 resulting in data loss. There are other scenarios > where the same can happen. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599064 ] ASF GitHub Bot logged work on HIVE-25034: - Author: ASF GitHub Bot Created on: 19/May/21 08:20 Start Date: 19/May/21 08:20 Worklog Time Spent: 10m Work Description: lcspinter commented on a change in pull request #2243: URL: https://github.com/apache/hive/pull/2243#discussion_r635016099 ## File path: iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java ## @@ -514,6 +519,75 @@ public void testInsertOverwritePartitionedTable() throws IOException { HiveIcebergTestUtils.validateData(table, expected, 0); } + @Test + public void testCTASFromHiveTable() { +Assume.assumeTrue("CTAS target table is supported only for HiveCatalog tables", +testTableType == TestTables.TestTableType.HIVE_CATALOG); + +shell.executeStatement("CREATE TABLE source (id bigint, name string) PARTITIONED BY (dept string) STORED AS ORC"); +shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'HR'), (2, 'Linda', 'Finance')"); + +shell.executeStatement(String.format( +"CREATE TABLE target STORED BY '%s' %s TBLPROPERTIES ('%s'='%s') AS SELECT * FROM source", +HiveIcebergStorageHandler.class.getName(), +testTables.locationForCreateTableSQL(TableIdentifier.of("default", "target")), +TableProperties.DEFAULT_FILE_FORMAT, fileFormat)); + +List objects = shell.executeStatement("SELECT * FROM target ORDER BY id"); +Assert.assertEquals(2, objects.size()); +Assert.assertArrayEquals(new Object[]{1L, "Mike", "HR"}, objects.get(0)); +Assert.assertArrayEquals(new Object[]{2L, "Linda", "Finance"}, objects.get(1)); + } + + @Test + public void testCTASFromDifferentIcebergCatalog() { +Assume.assumeTrue("CTAS target table is supported only for HiveCatalog tables", +testTableType == TestTables.TestTableType.HIVE_CATALOG); + +// get source data from a different catalog +shell.executeStatement(String.format( +"CREATE TABLE source STORED BY '%s' LOCATION '%s' TBLPROPERTIES ('%s'='%s', '%s'='%s')", +HiveIcebergStorageHandler.class.getName(), +temp.getRoot().getPath() + "/default/source/", +InputFormatConfig.CATALOG_NAME, Catalogs.ICEBERG_HADOOP_TABLE_NAME, +InputFormatConfig.TABLE_SCHEMA, SchemaParser.toJson(HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA))); +shell.executeStatement("INSERT INTO source VALUES (1, 'Mike', 'Roger'), (2, 'Linda', 'Albright')"); + +// CTAS into a new HiveCatalog table +shell.executeStatement(String.format( +"CREATE TABLE target STORED BY '%s' TBLPROPERTIES ('%s'='%s') AS SELECT * FROM source", +HiveIcebergStorageHandler.class.getName(), +TableProperties.DEFAULT_FILE_FORMAT, fileFormat)); + +List objects = shell.executeStatement("SELECT * FROM target ORDER BY customer_id"); +Assert.assertEquals(2, objects.size()); +Assert.assertArrayEquals(new Object[]{1L, "Mike", "Roger"}, objects.get(0)); +Assert.assertArrayEquals(new Object[]{2L, "Linda", "Albright"}, objects.get(1)); + } + + @Test + public void testCTASFailureRollback() throws IOException { Review comment: Could you please add a test method that checks the rollback in case the source and dest tables are in different catalogs? ## File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergCTASHook.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.iceberg.mr.hive; + +import java.util.Properties; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook; +import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext; +import org.apache.iceberg.mr.Catalogs; +import org.apache.iceberg.mr.InputFormatConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HiveIcebergCTASHook implements QueryLifeTimeHook
[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599055 ] ASF GitHub Bot logged work on HIVE-25034: - Author: ASF GitHub Bot Created on: 19/May/21 07:54 Start Date: 19/May/21 07:54 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2243: URL: https://github.com/apache/hive/pull/2243#discussion_r634998109 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -7759,6 +7759,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) .getMsg(destinationPath.toUri().toString())); } } + // handle direct insert CTAS case + // for direct insert CTAS, the table creation DDL is not added to the task plan in TaskCompiler, + // therefore we need to add the InsertHook here manually so that HiveMetaHook#commitInsertTable is called + if (qb.isCTAS() && tableDesc != null && tableDesc.getStorageHandler() != null) { +try { + if (HiveUtils.getStorageHandler(conf, tableDesc.getStorageHandler()).directInsertCTAS()) { +createPreInsertDesc(destinationTable, false); + } +} catch (HiveException e) { Review comment: Right. Now that I think about it, the main reason I swallowed the exception is that this is a general hive codepath, so didn't want to screw up any normal hive ctas queries. But I think we can assume that even for normal hive ctas, if the table has a storage handler defined, it should be loadable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599055) Time Spent: 3.5h (was: 3h 20m) > Implement CTAS for Iceberg > -- > > Key: HIVE-25034 > URL: https://issues.apache.org/jira/browse/HIVE-25034 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25122) Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver
[ https://issues.apache.org/jira/browse/HIVE-25122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347384#comment-17347384 ] Peter Vary commented on HIVE-25122: --- I would keep for a while, so if there is a reoccurrence of the issue others might find it. If this was only a one off issue then we can close it. Or we can just run a flaky test check on this test, to see if it is failing. If not, then we can close. http://ci.hive.apache.org/job/hive-flaky-check > Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver > -- > > Key: HIVE-25122 > URL: https://issues.apache.org/jira/browse/HIVE-25122 > Project: Hive > Issue Type: Bug >Reporter: Harish JP >Priority: Minor > Attachments: org.apache.hadoop.hive.cli.TestBeeLineDriver.txt > > > Hive test is failing with error. The build link where it failed: > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2120/4/tests/] > Error info: [^org.apache.hadoop.hive.cli.TestBeeLineDriver.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-25122) Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver
[ https://issues.apache.org/jira/browse/HIVE-25122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347384#comment-17347384 ] Peter Vary edited comment on HIVE-25122 at 5/19/21, 7:53 AM: - We can just run a flaky test check on this test, to see if it is failing. If not, then I think we can close. http://ci.hive.apache.org/job/hive-flaky-check was (Author: pvary): I would keep for a while, so if there is a reoccurrence of the issue others might find it. If this was only a one off issue then we can close it. Or we can just run a flaky test check on this test, to see if it is failing. If not, then we can close. http://ci.hive.apache.org/job/hive-flaky-check > Intermittent test failures in org.apache.hadoop.hive.cli.TestBeeLineDriver > -- > > Key: HIVE-25122 > URL: https://issues.apache.org/jira/browse/HIVE-25122 > Project: Hive > Issue Type: Bug >Reporter: Harish JP >Priority: Minor > Attachments: org.apache.hadoop.hive.cli.TestBeeLineDriver.txt > > > Hive test is failing with error. The build link where it failed: > [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2120/4/tests/] > Error info: [^org.apache.hadoop.hive.cli.TestBeeLineDriver.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25034) Implement CTAS for Iceberg
[ https://issues.apache.org/jira/browse/HIVE-25034?focusedWorklogId=599052&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599052 ] ASF GitHub Bot logged work on HIVE-25034: - Author: ASF GitHub Bot Created on: 19/May/21 07:46 Start Date: 19/May/21 07:46 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2243: URL: https://github.com/apache/hive/pull/2243#discussion_r634992707 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -7759,6 +7759,18 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) .getMsg(destinationPath.toUri().toString())); } } + // handle direct insert CTAS case + // for direct insert CTAS, the table creation DDL is not added to the task plan in TaskCompiler, + // therefore we need to add the InsertHook here manually so that HiveMetaHook#commitInsertTable is called + if (qb.isCTAS() && tableDesc != null && tableDesc.getStorageHandler() != null) { +try { + if (HiveUtils.getStorageHandler(conf, tableDesc.getStorageHandler()).directInsertCTAS()) { +createPreInsertDesc(destinationTable, false); + } +} catch (HiveException e) { Review comment: I think that would make sense, as we are not able to perform the CTAS without the classes on classpath anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599052) Time Spent: 3h 20m (was: 3h 10m) > Implement CTAS for Iceberg > -- > > Key: HIVE-25034 > URL: https://issues.apache.org/jira/browse/HIVE-25034 > Project: Hive > Issue Type: New Feature >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark
[ https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=599051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599051 ] ASF GitHub Bot logged work on HIVE-25130: - Author: ASF GitHub Bot Created on: 19/May/21 07:43 Start Date: 19/May/21 07:43 Worklog Time Spent: 10m Work Description: kishendas commented on a change in pull request #2285: URL: https://github.com/apache/hive/pull/2285#discussion_r634990814 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ## @@ -1252,32 +1253,45 @@ public static String getTaskIdFromFilename(String filename) { * @param filename * filename to extract taskid from */ - private static String getPrefixedTaskIdFromFilename(String filename) { + static String getPrefixedTaskIdFromFilename(String filename) { return getTaskIdFromFilename(filename, FILE_NAME_PREFIXED_TASK_ID_REGEX); } private static String getTaskIdFromFilename(String filename, Pattern pattern) { -return getIdFromFilename(filename, pattern, 1); +return getIdFromFilename(filename, pattern, 1, false); } - private static int getAttemptIdFromFilename(String filename) { -String attemptStr = getIdFromFilename(filename, FILE_NAME_PREFIXED_TASK_ID_REGEX, 3); + static int getAttemptIdFromFilename(String filename) { +String attemptStr = getIdFromFilename(filename, FILE_NAME_PREFIXED_TASK_ID_REGEX, 3, true); return Integer.parseInt(attemptStr.substring(1)); } - private static String getIdFromFilename(String filename, Pattern pattern, int group) { + private static String getIdFromFilename(String filename, Pattern pattern, int group, boolean extractAttemptId) { String taskId = filename; int dirEnd = filename.lastIndexOf(Path.SEPARATOR); -if (dirEnd != -1) { +if (dirEnd!=-1) { taskId = filename.substring(dirEnd + 1); } -Matcher m = pattern.matcher(taskId); -if (!m.matches()) { - LOG.warn("Unable to get task id from file name: {}. Using last component {}" - + " as task id.", filename, taskId); +// Spark emitted files have the format part-[number-string]-uuid.. +// Examples: part-00026-23003837-facb-49ec-b1c4-eeda902cacf3.c000.zlib.orc, 00026-23003837 is the taskId +// and part-4-c6acfdee-0c32-492e-b209-c2f1cf40.c000, 4-c6acfdee is the taskId +String strings[] = taskId.split("-"); Review comment: Agreed. I am not sure if the file format has changed in recent times. Let me talk to the Spark team and get their insights as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 599051) Time Spent: 40m (was: 0.5h) > alter table concat gives NullPointerException, when data is inserted from > Spark > --- > > Key: HIVE-25130 > URL: https://issues.apache.org/jira/browse/HIVE-25130 > Project: Hive > Issue Type: Bug >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > This is the complete stack trace of the NullPointerException > 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: > [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception > 'java.lang.NullPointerException(null)' > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333) > at > org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966) > at > org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907) > at > org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892) > at > org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797) > at > org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674) > at > org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544) > at > org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335) > at > org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeT