[jira] [Work logged] (HIVE-22824) JoinProjectTranspose rule should skip Projects containing windowing expression
[ https://issues.apache.org/jira/browse/HIVE-22824?focusedWorklogId=393210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393210 ] ASF GitHub Bot logged work on HIVE-22824: - Author: ASF GitHub Bot Created on: 26/Feb/20 07:14 Start Date: 26/Feb/20 07:14 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #897: HIVE-22824: JoinProjectTranspose rule should skip Projects containing… URL: https://github.com/apache/hive/pull/897#discussion_r384308354 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -487,7 +483,7 @@ Operator genOPTree(PlannerContext plannerCtx) throws SemanticException { ASTNode newAST = getOptimizedAST(newPlan); // 1.1. Fix up the query for insert/ctas/materialized views -newAST = fixUpAfterCbo(this.getAST(), newAST, cboCtx); Review comment: ok; after all we want to have cbo on more often than off This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393210) Time Spent: 40m (was: 0.5h) > JoinProjectTranspose rule should skip Projects containing windowing expression > -- > > Key: HIVE-22824 > URL: https://issues.apache.org/jira/browse/HIVE-22824 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22824.1.patch, HIVE-22824.2.patch, > HIVE-22824.3.patch, HIVE-22824.4.patch, HIVE-22824.5.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Otherwise this rule could end up creating plan with windowing expression > within join condition which hive doesn't know how to process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21410) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/HIVE-21410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated HIVE-21410: --- Description: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we should use when using beeline to connect. Log the actual port number would help us to better get the correct connect uri. before fixed: !2019-03-08_163705.png! after fixed: !2019-03-08_163747.png! was: before fixed: !2019-03-08_163705.png! after fixed: !2019-03-08_163747.png! > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: HIVE-21410 > URL: https://issues.apache.org/jira/browse/HIVE-21410 > Project: Hive > Issue Type: Improvement >Reporter: zuotingbing >Assignee: zuotingbing >Priority: Minor > Attachments: 2019-03-08_163705.png, 2019-03-08_163747.png, > HIVE-21410.patch > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use when using beeline to connect. Log > the actual port number would help us to better get the correct connect uri. > > before fixed: > !2019-03-08_163705.png! > after fixed: > !2019-03-08_163747.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22929) Performance: quoted identifier parsing uses throwaway Regex via String.replaceAll()
[ https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-22929: - Assignee: Krisztian Kasa > Performance: quoted identifier parsing uses throwaway Regex via > String.replaceAll() > --- > > Key: HIVE-22929 > URL: https://issues.apache.org/jira/browse/HIVE-22929 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Assignee: Krisztian Kasa >Priority: Major > Attachments: String.replaceAll.png > > > !String.replaceAll.png! > https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530 > {code} > '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, > getText().length() -1 ).replaceAll("``", "`")); } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045154#comment-17045154 ] Hive QA commented on HIVE-22840: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994602/HIVE-22840.03.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18075 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamptz_2] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20831/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20831/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20831/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12994602 - PreCommit-HIVE-Build > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22840.03.patch, HIVE-22840.1.patch, > HIVE-22840.2.patch, HIVE-22840.patch > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045137#comment-17045137 ] Hive QA commented on HIVE-22840: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 24s{color} | {color:blue} storage-api in master has 58 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 35s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 43s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 50s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s{color} | {color:red} storage-api: The patch generated 4 new + 17 unchanged - 3 fixed = 21 total (was 20) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} common: The patch generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} The patch serde passed checkstyle {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} The patch ql passed checkstyle {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 2 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} storage-api generated 0 new + 48 unchanged - 10 fixed = 48 total (was 58) {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} common in the patch passed. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} serde in the patch passed. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 27s{color} | {color:green} ql in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 16s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile xml | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20831/dev-support/hive-personality.sh | | git revision | master / 0280984 | | Default Java | 1.8.0_111 | | findbugs |
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Attachment: HIVE-22840.03.patch > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22840.03.patch, HIVE-22840.1.patch, > HIVE-22840.2.patch, HIVE-22840.patch > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22930) Performance: ASTNode::getName() allocates within the walker loops
[ https://issues.apache.org/jira/browse/HIVE-22930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045106#comment-17045106 ] Gopal Vijayaraghavan commented on HIVE-22930: - No idea why that is even used here - the numbers here are not constants across builds. > Performance: ASTNode::getName() allocates within the walker loops > - > > Key: HIVE-22930 > URL: https://issues.apache.org/jira/browse/HIVE-22930 > Project: Hive > Issue Type: Bug >Reporter: Gopal Vijayaraghavan >Priority: Major > Attachments: ASTNode-name.png > > > {code} > /* >* (non-Javadoc) >* >* @see org.apache.hadoop.hive.ql.lib.Node#getName() >*/ > @Override > public String getName() { > return String.valueOf(super.getToken().getType()); > } > {code} > !ASTNode-name.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22824) JoinProjectTranspose rule should skip Projects containing windowing expression
[ https://issues.apache.org/jira/browse/HIVE-22824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045064#comment-17045064 ] Jesus Camacho Rodriguez commented on HIVE-22824: +1 (pending tests) > JoinProjectTranspose rule should skip Projects containing windowing expression > -- > > Key: HIVE-22824 > URL: https://issues.apache.org/jira/browse/HIVE-22824 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22824.1.patch, HIVE-22824.2.patch, > HIVE-22824.3.patch, HIVE-22824.4.patch, HIVE-22824.5.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Otherwise this rule could end up creating plan with windowing expression > within join condition which hive doesn't know how to process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name
[ https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045061#comment-17045061 ] Thomas Poepping commented on HIVE-22928: Also, it's been a _really_ long time since I've contributed. Are we still doing patch files? GitHub pull requests? > Allow hive.exec.stagingdir to be a fully qualified directory name > - > > Key: HIVE-22928 > URL: https://issues.apache.org/jira/browse/HIVE-22928 > Project: Hive > Issue Type: Improvement > Components: Configuration, Hive >Affects Versions: 3.1.2 >Reporter: Thomas Poepping >Assignee: Thomas Poepping >Priority: Minor > > Currently, {{hive.exec.stagingdir}} can only be set as a relative directory > name that, for operations like {{insert}} or {{insert overwrite}}, will be > placed either under the table directory or the partition directory. > For cases where an HDFS cluster is small but the data being inserted is very > large (greater than the capacity of the HDFS cluster, as mentioned in a > comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their > staging directory to be an explicit blobstore path (or any filesystem path), > rather than relying on Hive to intelligently build the blobstore path based > on an interpretation of the job. We may lose locality guarantees, but because > renames are just as expensive on blobstores no matter what the prefix is, > this isn't considered a terribly large loss (assuming only blobstore > customers use this functionality). > Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually > suffice in this case, as the stagingdir is not the same. > This commit enables Hive customers to set an absolute location for all > staging directories. For instances where the configured stagingdir scheme is > not the same as the scheme for the table location, the default stagingdir > configuration is used. This avoids a cross-filesystem rename, which is > impossible anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22827) Update Flatbuffer version
[ https://issues.apache.org/jira/browse/HIVE-22827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045053#comment-17045053 ] Hive QA commented on HIVE-22827: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994580/HIVE-22827.99.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 18073 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[cbo_query31] (batchId=305) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testComplexQuery (batchId=290) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testDataTypes (batchId=290) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testEscapedStrings (batchId=290) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testLlapInputFormatEndToEnd (batchId=290) org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testNonAsciiStrings (batchId=290) org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow.testComplexQuery (batchId=292) org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow.testDataTypes (batchId=292) org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow.testEscapedStrings (batchId=292) org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow.testLlapInputFormatEndToEnd (batchId=292) org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow.testNonAsciiStrings (batchId=292) org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow.testTypesNestedInListWithLimitAndFilters (batchId=292) org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow.testTypesNestedInMapWithLimitAndFilters (batchId=292) org.apache.hive.jdbc.TestNewGetSplitsFormat.testComplexQuery (batchId=290) org.apache.hive.jdbc.TestNewGetSplitsFormat.testDataTypes (batchId=290) org.apache.hive.jdbc.TestNewGetSplitsFormat.testEscapedStrings (batchId=290) org.apache.hive.jdbc.TestNewGetSplitsFormat.testLlapInputFormatEndToEnd (batchId=290) org.apache.hive.jdbc.TestNewGetSplitsFormat.testNonAsciiStrings (batchId=290) org.apache.hive.jdbc.TestNewGetSplitsFormatReturnPath.testComplexQuery (batchId=292) org.apache.hive.jdbc.TestNewGetSplitsFormatReturnPath.testDataTypes (batchId=292) org.apache.hive.jdbc.TestNewGetSplitsFormatReturnPath.testEscapedStrings (batchId=292) org.apache.hive.jdbc.TestNewGetSplitsFormatReturnPath.testLlapInputFormatEndToEnd (batchId=292) org.apache.hive.jdbc.TestNewGetSplitsFormatReturnPath.testNonAsciiStrings (batchId=292) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20829/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20829/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20829/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 23 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12994580 - PreCommit-HIVE-Build > Update Flatbuffer version > - > > Key: HIVE-22827 > URL: https://issues.apache.org/jira/browse/HIVE-22827 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22827.99.patch, HIVE-22827.patch > > > Hive currently uses Flatbuffer 1.2.0. Other Apache projects use a more > up-to-date version, e.g. 1.6.0.1. Upgrade to that version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name
[ https://issues.apache.org/jira/browse/HIVE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Poepping reassigned HIVE-22928: -- > Allow hive.exec.stagingdir to be a fully qualified directory name > - > > Key: HIVE-22928 > URL: https://issues.apache.org/jira/browse/HIVE-22928 > Project: Hive > Issue Type: Improvement > Components: Configuration, Hive >Affects Versions: 3.1.2 >Reporter: Thomas Poepping >Assignee: Thomas Poepping >Priority: Minor > > Currently, {{hive.exec.stagingdir}} can only be set as a relative directory > name that, for operations like {{insert}} or {{insert overwrite}}, will be > placed either under the table directory or the partition directory. > For cases where an HDFS cluster is small but the data being inserted is very > large (greater than the capacity of the HDFS cluster, as mentioned in a > comment by [~ashutoshc] on [HIVE-14270]), the client may want to set their > staging directory to be an explicit blobstore path (or any filesystem path), > rather than relying on Hive to intelligently build the blobstore path based > on an interpretation of the job. We may lose locality guarantees, but because > renames are just as expensive on blobstores no matter what the prefix is, > this isn't considered a terribly large loss (assuming only blobstore > customers use this functionality). > Note that {{hive.blobstore.use.blobstore.as.scratchdir}} doesn't actually > suffice in this case, as the stagingdir is not the same. > This commit enables Hive customers to set an absolute location for all > staging directories. For instances where the configured stagingdir scheme is > not the same as the scheme for the table location, the default stagingdir > configuration is used. This avoids a cross-filesystem rename, which is > impossible anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22824) JoinProjectTranspose rule should skip Projects containing windowing expression
[ https://issues.apache.org/jira/browse/HIVE-22824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045052#comment-17045052 ] Vineet Garg commented on HIVE-22824: [~jcamachorodriguez] Added the test and TODO in latest change. Also opened CALCITE-3824 > JoinProjectTranspose rule should skip Projects containing windowing expression > -- > > Key: HIVE-22824 > URL: https://issues.apache.org/jira/browse/HIVE-22824 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22824.1.patch, HIVE-22824.2.patch, > HIVE-22824.3.patch, HIVE-22824.4.patch, HIVE-22824.5.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Otherwise this rule could end up creating plan with windowing expression > within join condition which hive doesn't know how to process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393070 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384197923 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2519,6 +2519,9 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "higher compute cost. (NDV means the number of distinct values.). It only affects the FM-Sketch \n" + "(not the HLL algorithm which is the default), where it computes the number of necessary\n" + " bitvectors to achieve the accuracy."), +HIVE_STATS_USE_UDF_ESTIMATORS("hive.stats.use.statestimators", true, +"Statestimators are able to provide more accurate column statistic infos for UDF results."), Review comment: Statestimators -> Estimators This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393070) Time Spent: 0.5h (was: 20m) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393063 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384197505 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java ## @@ -132,9 +133,11 @@ public String generateInitFileName(String toVersion) throws HiveMetaException { String initScriptName = INIT_FILE_PREFIX + toVersion + "." + dbType + SQL_FILE_EXTENSION; // check if the file exists -if (!(new File(getMetaStoreScriptDir() + File.separatorChar + - initScriptName).exists())) { - throw new HiveMetaException("Unknown version specified for initialization: " + toVersion); +File file = new File(getMetaStoreScriptDir() + File.separatorChar + + initScriptName); +if (!(file.exists())) { Review comment: nit. enclosing () for file not needed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393063) Time Spent: 20m (was: 10m) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393069=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393069 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384197755 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -2519,6 +2519,9 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "higher compute cost. (NDV means the number of distinct values.). It only affects the FM-Sketch \n" + "(not the HLL algorithm which is the default), where it computes the number of necessary\n" + " bitvectors to achieve the accuracy."), +HIVE_STATS_USE_UDF_ESTIMATORS("hive.stats.use.statestimators", true, Review comment: 'hive.stats.use.statestimators' -> 'hive.stats.estimators.enable' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393069) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393072 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384199270 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -73,6 +74,9 @@ import org.apache.hadoop.hive.ql.plan.Statistics; import org.apache.hadoop.hive.ql.plan.Statistics.State; import org.apache.hadoop.hive.ql.stats.BasicStats.Factory; +import org.apache.hadoop.hive.ql.stats.estimator.IStatEstimator; Review comment: Currently in Hive we do not seem to prefix interfaces with I (maybe we should start doing it and I know other projects do, but I wanted to mention it since it does not follow current convention). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393072) Time Spent: 0.5h (was: 20m) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393068 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384203016 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java ## @@ -131,4 +137,52 @@ public BytesWritable evaluate(BytesWritable bw, IntWritable pos, IntWritable len public BytesWritable evaluate(BytesWritable bw, IntWritable pos){ return evaluate(bw, pos, maxValue); } + + @Override + public Optional getStatEstimator() { +return Optional.of(new SubStrStatEstimator()); + } + + private static class SubStrStatEstimator implements IStatEstimator { + +@Override +public Optional estimate(List csList) { + ColStatistics cs = csList.get(0).clone(); + + // this might bad in a skewed case; consider: + // 1 row with 1000 long string + // 99 rows with 0 length + // orig avg is 10 + // new avg is 5 (if substr(5)) ; but in reality it will stay ~10 + Optional start = getRangeWidth(csList.get(1).getRange()); + Range startRange = csList.get(1).getRange(); + if (startRange != null && startRange.minValue != null) { +double newAvgColLen = cs.getAvgColLen() - startRange.minValue.doubleValue(); +if (newAvgColLen > 0) { + cs.setAvgColLen(newAvgColLen); +} + Review comment: you can remove \n This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393068) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393065 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384200546 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -1560,6 +1554,32 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis } } + if (conf.getBoolVar(ConfVars.HIVE_STATS_USE_UDF_ESTIMATORS)) { +Optional sep = engfd.getGenericUDF().adapt(IStatEstimatorProvider.class); +if (sep.isPresent()) { + Optional se = sep.get().getStatEstimator(); Review comment: Should we assume that if a UDF is an estimator provider, it should provide an estimator, and thus, throw an exception here if we cannot get the estimator? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393065) Time Spent: 20m (was: 10m) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393067 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384201501 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -1590,6 +1610,43 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis return colStats; } + private static ColStatistics buildColStatForConstant(HiveConf conf, long numRows, ExprNodeConstantDesc encd) { + +long numNulls = 0; +long countDistincts = 0; +if (encd.getValue() == null) { + // null projection + numNulls = numRows; +} else { + countDistincts = 1; +} +String colType = encd.getTypeString(); +colType = colType.toLowerCase(); +ObjectInspector oi = encd.getWritableObjectInspector(); +double avgColSize = getAvgColLenOf(conf, oi, colType); +ColStatistics colStats = new ColStatistics(encd.getName(), colType); +colStats.setAvgColLen(avgColSize); +colStats.setCountDistint(countDistincts); +colStats.setNumNulls(numNulls); + +Optional value = getLongConstValue(encd); Review comment: What about other types? Can we either add them in this patch, or add a comment and create a follow-up? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393067) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393066 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384200980 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java ## @@ -1560,6 +1554,32 @@ public static ColStatistics getColStatisticsFromExpression(HiveConf conf, Statis } } + if (conf.getBoolVar(ConfVars.HIVE_STATS_USE_UDF_ESTIMATORS)) { +Optional sep = engfd.getGenericUDF().adapt(IStatEstimatorProvider.class); +if (sep.isPresent()) { + Optional se = sep.get().getStatEstimator(); + if (se.isPresent()) { +List csList = new ArrayList(); +for (ExprNodeDesc child : engfd.getChildren()) { + ColStatistics cs = getColStatisticsFromExpression(conf, parentStats, child); + if (cs == null) { +break; + } + csList.add(cs); +} +if(csList.size() == engfd.getChildren().size()) { + Optional res = se.get().estimate(csList); + if (res.isPresent()) { Review comment: When could this happen? Maybe when column stats has anything unexpected? Could we add a comment clarifying it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393066) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393071 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384202325 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/estimator/StatEstimators.java ## @@ -0,0 +1,51 @@ +package org.apache.hadoop.hive.ql.stats.estimator; + +import java.util.Optional; + +import org.apache.hadoop.hive.ql.plan.ColStatistics; + +public class StatEstimators { + + public static class WorstStatCombiner { Review comment: Should this implement an interface? (I do not have a strong opinion, just a comment) Nit. Can the name be changed? DefaultCombiner? Then you can add in the comments that it considers max upper bounds for the properties, etc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393071) Time Spent: 0.5h (was: 20m) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22893) Enhance data size estimation for fields computed by UDFs
[ https://issues.apache.org/jira/browse/HIVE-22893?focusedWorklogId=393064=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393064 ] ASF GitHub Bot logged work on HIVE-22893: - Author: ASF GitHub Bot Created on: 26/Feb/20 00:09 Start Date: 26/Feb/20 00:09 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #915: HIVE-22893 StatEstimate URL: https://github.com/apache/hive/pull/915#discussion_r384201744 ## File path: ql/src/java/org/apache/hadoop/hive/ql/stats/estimator/StatEstimators.java ## @@ -0,0 +1,51 @@ +package org.apache.hadoop.hive.ql.stats.estimator; + +import java.util.Optional; + +import org.apache.hadoop.hive.ql.plan.ColStatistics; + +public class StatEstimators { Review comment: Could we add comments? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393064) Time Spent: 20m (was: 10m) > Enhance data size estimation for fields computed by UDFs > > > Key: HIVE-22893 > URL: https://issues.apache.org/jira/browse/HIVE-22893 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, > HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, > HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, > HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch, > HIVE-22893.12.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Right now if we have columnstat on a column ; we use that to estimate things > about the column; - however if an UDF is executed on a column ; the resulting > column is treated as unknown thing and defaults are assumed. > An improvement could be to give wide estimation(s) in case of frequently used > udf. > For example; consider {{substr(c,1,1)}} ; no matter what the input; the > output is at most a 1 long string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22827) Update Flatbuffer version
[ https://issues.apache.org/jira/browse/HIVE-22827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045027#comment-17045027 ] Hive QA commented on HIVE-22827: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 12m 17s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc xml compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20829/dev-support/hive-personality.sh | | git revision | master / 0280984 | | Default Java | 1.8.0_111 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20829/yetus/patch-asflicense-problems.txt | | modules | C: serde U: serde | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20829/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Update Flatbuffer version > - > > Key: HIVE-22827 > URL: https://issues.apache.org/jira/browse/HIVE-22827 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22827.99.patch, HIVE-22827.patch > > > Hive currently uses Flatbuffer 1.2.0. Other Apache projects use a more > up-to-date version, e.g. 1.6.0.1. Upgrade to that version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22920) Add row format OpenCSVSerde to the metastore column managed list
[ https://issues.apache.org/jira/browse/HIVE-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045018#comment-17045018 ] Ramesh Kumar Thangarajan commented on HIVE-22920: - [~ashutoshc], Can you please help review this, I was able to get a green run? > Add row format OpenCSVSerde to the metastore column managed list > > > Key: HIVE-22920 > URL: https://issues.apache.org/jira/browse/HIVE-22920 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22920.1.patch, HIVE-22920.2.patch > > > Add row format OpenCSVSerde to the metastore column managed list -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22920) Add row format OpenCSVSerde to the metastore column managed list
[ https://issues.apache.org/jira/browse/HIVE-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045016#comment-17045016 ] Hive QA commented on HIVE-22920: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994573/HIVE-22920.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18074 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20828/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20828/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20828/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12994573 - PreCommit-HIVE-Build > Add row format OpenCSVSerde to the metastore column managed list > > > Key: HIVE-22920 > URL: https://issues.apache.org/jira/browse/HIVE-22920 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22920.1.patch, HIVE-22920.2.patch > > > Add row format OpenCSVSerde to the metastore column managed list -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22829) Decimal64: NVL in vectorization miss NPE with CBO on
[ https://issues.apache.org/jira/browse/HIVE-22829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan updated HIVE-22829: Attachment: HIVE-22829.3.patch Status: Patch Available (was: Open) > Decimal64: NVL in vectorization miss NPE with CBO on > > > Key: HIVE-22829 > URL: https://issues.apache.org/jira/browse/HIVE-22829 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal Vijayaraghavan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22829.3.patch > > > {code} > select > sum(NVL(ss_sales_price, 1.0BD)) > from store_sales where ss_sold_date_sk % = 1; > {code} > {code} > | notVectorizedReason: exception: > java.lang.NullPointerException stack trace: > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4754), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4687), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4669), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5269), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:977), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:864), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:834), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2500(Vectorizer.java:245), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2103), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2055), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:2030), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:1185), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:1017), > > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111), > > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180), > ... | > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22829) Decimal64: NVL in vectorization miss NPE with CBO on
[ https://issues.apache.org/jira/browse/HIVE-22829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan updated HIVE-22829: Attachment: (was: HIVE-22829.3.patch) > Decimal64: NVL in vectorization miss NPE with CBO on > > > Key: HIVE-22829 > URL: https://issues.apache.org/jira/browse/HIVE-22829 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal Vijayaraghavan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22829.3.patch > > > {code} > select > sum(NVL(ss_sales_price, 1.0BD)) > from store_sales where ss_sold_date_sk % = 1; > {code} > {code} > | notVectorizedReason: exception: > java.lang.NullPointerException stack trace: > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4754), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4687), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4669), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5269), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:977), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:864), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:834), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2500(Vectorizer.java:245), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2103), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2055), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:2030), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:1185), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:1017), > > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111), > > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180), > ... | > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22829) Decimal64: NVL in vectorization miss NPE with CBO on
[ https://issues.apache.org/jira/browse/HIVE-22829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan updated HIVE-22829: Status: Open (was: Patch Available) > Decimal64: NVL in vectorization miss NPE with CBO on > > > Key: HIVE-22829 > URL: https://issues.apache.org/jira/browse/HIVE-22829 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Gopal Vijayaraghavan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22829.3.patch > > > {code} > select > sum(NVL(ss_sales_price, 1.0BD)) > from store_sales where ss_sold_date_sk % = 1; > {code} > {code} > | notVectorizedReason: exception: > java.lang.NullPointerException stack trace: > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4754), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.fixDecimalDataTypePhysicalVariations(Vectorizer.java:4687), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.vectorizeSelectOperator(Vectorizer.java:4669), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperator(Vectorizer.java:5269), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChild(Vectorizer.java:977), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.doProcessChildren(Vectorizer.java:864), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateAndVectorizeOperatorTree(Vectorizer.java:834), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.access$2500(Vectorizer.java:245), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2103), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapOperators(Vectorizer.java:2055), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(Vectorizer.java:2030), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:1185), > > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:1017), > > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111), > > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180), > ... | > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[jira] [Commented] (HIVE-22920) Add row format OpenCSVSerde to the metastore column managed list
[ https://issues.apache.org/jira/browse/HIVE-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044990#comment-17044990 ] Hive QA commented on HIVE-22920: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 11s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 50s{color} | {color:blue} standalone-metastore/metastore-common in master has 35 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 36s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 48s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 41m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20828/dev-support/hive-personality.sh | | git revision | master / 2a35bbc | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20828/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore/metastore-common common ql itests U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20828/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Add row format OpenCSVSerde to the metastore column managed list > > > Key: HIVE-22920 > URL: https://issues.apache.org/jira/browse/HIVE-22920 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22920.1.patch, HIVE-22920.2.patch > > > Add row format OpenCSVSerde to the metastore column managed list -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22891) Skip PartitionDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode
[ https://issues.apache.org/jira/browse/HIVE-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22891: --- Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master, thanks [~srahman] for your contribution and [~szita] for reviewing! > Skip PartitionDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode > -- > > Key: HIVE-22891 > URL: https://issues.apache.org/jira/browse/HIVE-22891 > Project: Hive > Issue Type: Task >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22891.01.patch, HIVE-22891.02.patch, > HIVE-22891.03.patch > > > {code:java} > try { > // TODO: refactor this out > if (pathToPartInfo == null) { > MapWork mrwork; > if (HiveConf.getVar(conf, > HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) { > mrwork = (MapWork) Utilities.getMergeWork(jobConf); > if (mrwork == null) { > mrwork = Utilities.getMapWork(jobConf); > } > } else { > mrwork = Utilities.getMapWork(jobConf); > } > pathToPartInfo = mrwork.getPathToPartitionInfo(); > } PartitionDesc part = extractSinglePartSpec(hsplit); > inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part); > } catch (HiveException e) { > throw new IOException(e); > } > {code} > The above piece of code in CombineHiveRecordReader.java was introduced in > HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is > not required in non-LLAP mode of execution as the method > HiveInputFormat.wrapForLlap() simply returns the previously defined > inputFormat in case of non-LLAP mode. The method call extractSinglePartSpec() > has some serious performance implications. If there are large no. of small > files, each call in the method extractSinglePartSpec() takes approx ~ (2 - 3) > seconds. Hence the same query which runs in Hive 1.x / Hive 2 is way faster > than the query run on latest hive. > {code:java} > 2020-02-11 07:15:04,701 INFO [main] > org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from > 2020-02-11 07:15:06,468 WARN [main] > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: Multiple partitions > found; not going to pass a part spec to LLAP IO: {{logdate=2020-02-03, > hour=01, event=win}} and {{logdate=2020-02-03, hour=02, event=act}} > 2020-02-11 07:15:06,468 INFO [main] > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting > org.apache.hadoop.mapred.FileSplit{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22891) Skip PartitionDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode
[ https://issues.apache.org/jira/browse/HIVE-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22891: --- Summary: Skip PartitionDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode (was: Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode) > Skip PartitionDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode > -- > > Key: HIVE-22891 > URL: https://issues.apache.org/jira/browse/HIVE-22891 > Project: Hive > Issue Type: Task >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22891.01.patch, HIVE-22891.02.patch, > HIVE-22891.03.patch > > > {code:java} > try { > // TODO: refactor this out > if (pathToPartInfo == null) { > MapWork mrwork; > if (HiveConf.getVar(conf, > HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) { > mrwork = (MapWork) Utilities.getMergeWork(jobConf); > if (mrwork == null) { > mrwork = Utilities.getMapWork(jobConf); > } > } else { > mrwork = Utilities.getMapWork(jobConf); > } > pathToPartInfo = mrwork.getPathToPartitionInfo(); > } PartitionDesc part = extractSinglePartSpec(hsplit); > inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part); > } catch (HiveException e) { > throw new IOException(e); > } > {code} > The above piece of code in CombineHiveRecordReader.java was introduced in > HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is > not required in non-LLAP mode of execution as the method > HiveInputFormat.wrapForLlap() simply returns the previously defined > inputFormat in case of non-LLAP mode. The method call extractSinglePartSpec() > has some serious performance implications. If there are large no. of small > files, each call in the method extractSinglePartSpec() takes approx ~ (2 - 3) > seconds. Hence the same query which runs in Hive 1.x / Hive 2 is way faster > than the query run on latest hive. > {code:java} > 2020-02-11 07:15:04,701 INFO [main] > org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from > 2020-02-11 07:15:06,468 WARN [main] > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: Multiple partitions > found; not going to pass a part spec to LLAP IO: {{logdate=2020-02-03, > hour=01, event=win}} and {{logdate=2020-02-03, hour=02, event=act}} > 2020-02-11 07:15:06,468 INFO [main] > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting > org.apache.hadoop.mapred.FileSplit{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22927) LLAP should filter guaranteed tasks before killing in node heartbeat
[ https://issues.apache.org/jira/browse/HIVE-22927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044971#comment-17044971 ] Prasanth Jayachandran commented on HIVE-22927: -- Instead of killing the error'ed out task attempts the original patch killed all attempts from a pinging node. This patch catches only the error'ed attempts and issues kill on them. +1, lgtm. Thanks for fixing! > LLAP should filter guaranteed tasks before killing in node heartbeat > - > > Key: HIVE-22927 > URL: https://issues.apache.org/jira/browse/HIVE-22927 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22927.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22927) LLAP should filter guaranteed tasks before killing in node heartbeat
[ https://issues.apache.org/jira/browse/HIVE-22927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-22927: Assignee: Rajesh Balamohan > LLAP should filter guaranteed tasks before killing in node heartbeat > - > > Key: HIVE-22927 > URL: https://issues.apache.org/jira/browse/HIVE-22927 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22927.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22832) Parallelise direct insert directory cleaning process
[ https://issues.apache.org/jira/browse/HIVE-22832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044948#comment-17044948 ] Hive QA commented on HIVE-22832: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994558/HIVE-22832.6.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 29 failed/errored test(s), 18073 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] (batchId=45) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_transactional_full_acid] (batchId=87) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_acid_no_masking] (batchId=27) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_all] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=185) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_vectorization_missing_cols] (batchId=172) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[change_allowincompatible_vectorization_false_date] (batchId=187) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_mm] (batchId=167) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_only_empty_query] (batchId=188) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_overwrite] (batchId=170) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_part] (batchId=173) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_part_llap_io] (batchId=191) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_part_llap_io] (batchId=166) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_stats] (batchId=189) org.apache.hadoop.hive.ql.TestTxnCommands.testMergeCase (batchId=361) org.apache.hadoop.hive.ql.TestTxnCommands.testMergeDeleteUpdate (batchId=361) org.apache.hadoop.hive.ql.TestTxnCommands.testQuotedIdentifier (batchId=361) org.apache.hadoop.hive.ql.TestTxnCommands.testQuotedIdentifier2 (batchId=361) org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge (batchId=342) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge (batchId=356) org.apache.hadoop.hive.ql.TestTxnCommandsWithSplitUpdateAndVectorization.testMergeCase (batchId=342) org.apache.hadoop.hive.ql.TestTxnCommandsWithSplitUpdateAndVectorization.testMergeDeleteUpdate (batchId=342) org.apache.hadoop.hive.ql.TestTxnCommandsWithSplitUpdateAndVectorization.testQuotedIdentifier (batchId=342) org.apache.hadoop.hive.ql.TestTxnCommandsWithSplitUpdateAndVectorization.testQuotedIdentifier2 (batchId=342) org.apache.hadoop.hive.ql.TestTxnLoadData.testValidations (batchId=317) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosACID.testForParallelBootstrapLoad (batchId=260) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosACID.testMetadataOnlyDump (batchId=260) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosACID.testNonParallelBootstrapLoad (batchId=260) org.apache.hadoop.hive.ql.parse.TestStatsReplicationScenariosACID.testRetryFailure (batchId=260) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20827/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20827/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20827/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 29 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12994558 - PreCommit-HIVE-Build > Parallelise direct insert directory cleaning process > > > Key: HIVE-22832 > URL: https://issues.apache.org/jira/browse/HIVE-22832 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22832.1.patch, HIVE-22832.2.patch, > HIVE-22832.3.patch, HIVE-22832.4.patch, HIVE-22832.5.patch, HIVE-22832.6.patch > > > Inside Utilities::handleDirectInsertTableFinalPath, the > cleanDirectInsertDirectories method is called sequentially for each element > of the directInsertDirectories list, which might have a large number of > elements depending on how many partitions were written. This current > sequential execution could be improved by parallelising the clean up process.
[jira] [Commented] (HIVE-22832) Parallelise direct insert directory cleaning process
[ https://issues.apache.org/jira/browse/HIVE-22832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044898#comment-17044898 ] Hive QA commented on HIVE-22832: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 1s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 43s{color} | {color:red} ql: The patch generated 5 new + 104 unchanged - 2 fixed = 109 total (was 106) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 47s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20827/dev-support/hive-personality.sh | | git revision | master / 2a35bbc | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20827/yetus/diff-checkstyle-ql.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20827/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20827/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Parallelise direct insert directory cleaning process > > > Key: HIVE-22832 > URL: https://issues.apache.org/jira/browse/HIVE-22832 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22832.1.patch, HIVE-22832.2.patch, > HIVE-22832.3.patch, HIVE-22832.4.patch, HIVE-22832.5.patch, HIVE-22832.6.patch > > > Inside Utilities::handleDirectInsertTableFinalPath, the > cleanDirectInsertDirectories method is called sequentially for each element > of the directInsertDirectories list, which might have a large number of > elements depending on how many partitions were written. This current > sequential execution could be improved by parallelising the clean up process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22527) Hive on Tez : Job of merging samll files will be submitted into another queue (default queue)
[ https://issues.apache.org/jira/browse/HIVE-22527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044841#comment-17044841 ] Naveen Gangam commented on HIVE-22527: -- [~zhangbutao] Could you please rebase the patch and attach a new patch for master so we could get this thru? Thanks > Hive on Tez : Job of merging samll files will be submitted into another queue > (default queue) > - > > Key: HIVE-22527 > URL: https://issues.apache.org/jira/browse/HIVE-22527 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0, 3.1.1 >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Blocker > Fix For: 3.1.0 > > Attachments: HIVE-22527-branch-3.1.0.patch, explain with merge > files.png, file merge job.png, hive logs.png > > > Hive on Tez. We enable small file merge configuration with set > *hive.merge.tezfiles=true*. So , There will be another job launched for > merging files after sql job. However, the merge file job is submitted into > another yarn queue, not the queue of current beeline client session. It seems > that the merging files job start a new tez session with new conf which is > different the current session conf, leading to the merging file job goes into > default queue. > > Attachment *hive logs.png* shows that current session queue is > *root.bdoc.production* ( String queueName = session.getQueueName();) incoming > queue name is *null* ( String confQueueName = > conf.get(TezConfiguration.TEZ_QUEUE_NAME);). In fact, we log in to the same > beeline client with *set tez.queue.name=* *root.bdoc.production,* and all > jobs should be submitted into the same queue including file merge job. > [https://github.com/apache/hive/blob/bcc7df95824831a8d2f1524e4048dfc23ab98c19/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L445] > [https://github.com/apache/hive/blob/bcc7df95824831a8d2f1524e4048dfc23ab98c19/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java#L446] > > Attachment *explain with merge files.png* shows that ** the stage-4 is > individual merge file job which is submitted into another yarn queue(default > queue), not the queue root.bdoc.production. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044835#comment-17044835 ] Hive QA commented on HIVE-22925: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994561/HIVE-22925.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1447 failed/errored test(s), 18051 tests executed *Failed tests:* {noformat} TestJdbcWithMiniLlapArrow - did not produce a TEST-*.xml file (likely timed out) (batchId=290) org.apache.hadoop.hive.cli.TestKuduCliDriver.testCliDriver[kudu_complex_queries] (batchId=296) org.apache.hadoop.hive.cli.TestKuduCliDriver.testCliDriver[kudu_queries] (batchId=296) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druid_materialized_view_rewrite_ssb] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druid_timeseries] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druid_timestamptz2] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druid_topn] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_dynamic_partition] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_expressions] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_extractTime] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_floorTime] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_mv] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_semijoin_reduction_all_types] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test1] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_alter] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_insert] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_ts] (batchId=204) org.apache.hadoop.hive.cli.TestMiniDruidKafkaCliDriver.testCliDriver[druidkafkamini_basic] (batchId=305) org.apache.hadoop.hive.cli.TestMiniHiveKafkaCliDriver.testCliDriver[kafka_storage_handler] (batchId=305) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[alter_table_location2] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[alter_table_location3] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[bucket5] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[bucket6] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[cte_2] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[cte_4] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[cttl] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynpart_cast] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[empty_dir_in_table] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[except_distinct] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[external_table_purge] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[external_table_with_space_in_location_path] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[file_with_header_footer] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[file_with_header_footer_aggregation] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[global_limit] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[import_exported_table] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[insert_into1] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[insert_into2] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_all] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_distinct] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_merge] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_nullscan] (batchId=159)
[jira] [Commented] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044789#comment-17044789 ] Hive QA commented on HIVE-22925: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 2s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 39s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 56s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s{color} | {color:red} common: The patch generated 3 new + 371 unchanged - 0 fixed = 374 total (was 371) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s{color} | {color:red} ql: The patch generated 20 new + 44 unchanged - 1 fixed = 64 total (was 45) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 49s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 58s{color} | {color:red} ql generated 1 new + 99 unchanged - 1 fixed = 100 total (was 100) {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20826/dev-support/hive-personality.sh | | git revision | master / 2a35bbc | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20826/yetus/diff-checkstyle-common.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20826/yetus/diff-checkstyle-ql.txt | | javadoc | http://104.198.109.242/logs//PreCommit-HIVE-Build-20826/yetus/diff-javadoc-javadoc-ql.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20826/yetus/patch-asflicense-problems.txt | | modules | C: common ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20826/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For
[jira] [Updated] (HIVE-22827) Update Flatbuffer version
[ https://issues.apache.org/jira/browse/HIVE-22827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22827: --- Status: Patch Available (was: Reopened) > Update Flatbuffer version > - > > Key: HIVE-22827 > URL: https://issues.apache.org/jira/browse/HIVE-22827 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22827.99.patch, HIVE-22827.patch > > > Hive currently uses Flatbuffer 1.2.0. Other Apache projects use a more > up-to-date version, e.g. 1.6.0.1. Upgrade to that version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22827) Update Flatbuffer version
[ https://issues.apache.org/jira/browse/HIVE-22827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22827: --- Attachment: HIVE-22827.99.patch > Update Flatbuffer version > - > > Key: HIVE-22827 > URL: https://issues.apache.org/jira/browse/HIVE-22827 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22827.99.patch, HIVE-22827.patch > > > Hive currently uses Flatbuffer 1.2.0. Other Apache projects use a more > up-to-date version, e.g. 1.6.0.1. Upgrade to that version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HIVE-22827) Update Flatbuffer version
[ https://issues.apache.org/jira/browse/HIVE-22827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reopened HIVE-22827: > Update Flatbuffer version > - > > Key: HIVE-22827 > URL: https://issues.apache.org/jira/browse/HIVE-22827 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22827.patch > > > Hive currently uses Flatbuffer 1.2.0. Other Apache projects use a more > up-to-date version, e.g. 1.6.0.1. Upgrade to that version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21487) COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes
[ https://issues.apache.org/jira/browse/HIVE-21487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044764#comment-17044764 ] Hive QA commented on HIVE-21487: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994557/HIVE-21487.06.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18073 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 (batchId=281) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20825/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20825/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20825/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12994557 - PreCommit-HIVE-Build > COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes > > > Key: HIVE-21487 > URL: https://issues.apache.org/jira/browse/HIVE-21487 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Todd Lipcon >Assignee: László Pintér >Priority: Major > Attachments: HIVE-21487.03.patch, HIVE-21487.04.patch, > HIVE-21487.05.patch, HIVE-21487.06.patch, HIVE-21847.01.patch, > HIVE-21847.02.patch > > > Looking at a MySQL install where HMS is pointed on Hive 3.1, I see a constant > stream of queries of the form: > {code} > select CC_STATE from COMPLETED_COMPACTIONS where CC_DATABASE = > 'tpcds_orc_exact_1000' and CC_TABLE = 'catalog_returns' and CC_PARTITION = > 'cr_returned_date_sk=2452851' and CC_STATE != 'a' order by CC_ID desc; > {code} > but the COMPLETED_COMPACTIONS table has no index. In this case it's resulting > in a full table scan over 115k rows, which takes around 100ms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22824) JoinProjectTranspose rule should skip Projects containing windowing expression
[ https://issues.apache.org/jira/browse/HIVE-22824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044759#comment-17044759 ] Jesus Camacho Rodriguez commented on HIVE-22824: A couple of comments: - Can we add a test case for the issue? Currently the following seems to fail: {code} CREATE TABLE table1 (a INT, b INT); INSERT INTO table1 VALUES (1, 2), (1, 2), (1, 2), (1, 2); EXPLAIN CBO SELECT sub1.r FROM ( SELECT RANK() OVER (ORDER BY t1.b desc) as r FROM table1 t1 JOIN table1 t2 ON t1.a = t2.b ) sub1 LEFT OUTER JOIN table1 t3 ON sub1.r = t3.a; {code} - Can we create a Calcite issue and contribute the fix? iiuc, this issue can lead to incorrect rewriting, and thus, results. Further, please create a Hive issue / leave a note to remove the logic in Hive's {{JoinProjectTransposeRule}} once we upgrade? > JoinProjectTranspose rule should skip Projects containing windowing expression > -- > > Key: HIVE-22824 > URL: https://issues.apache.org/jira/browse/HIVE-22824 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22824.1.patch, HIVE-22824.2.patch, > HIVE-22824.3.patch, HIVE-22824.4.patch, HIVE-22824.5.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Otherwise this rule could end up creating plan with windowing expression > within join condition which hive doesn't know how to process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21487) COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes
[ https://issues.apache.org/jira/browse/HIVE-21487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044747#comment-17044747 ] Hive QA commented on HIVE-21487: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 1s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 35s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 22s{color} | {color:blue} standalone-metastore/metastore-server in master has 185 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 52s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20825/dev-support/hive-personality.sh | | git revision | master / 2a35bbc | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20825/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore/metastore-server . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20825/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes > > > Key: HIVE-21487 > URL: https://issues.apache.org/jira/browse/HIVE-21487 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Todd Lipcon >Assignee: László Pintér >Priority: Major > Attachments: HIVE-21487.03.patch, HIVE-21487.04.patch, > HIVE-21487.05.patch, HIVE-21487.06.patch, HIVE-21847.01.patch, > HIVE-21847.02.patch > > > Looking at a MySQL install where HMS is pointed on Hive 3.1, I see a constant > stream of queries of the form: > {code} > select CC_STATE from COMPLETED_COMPACTIONS where CC_DATABASE = > 'tpcds_orc_exact_1000' and CC_TABLE = 'catalog_returns' and CC_PARTITION = > 'cr_returned_date_sk=2452851' and CC_STATE != 'a' order by CC_ID desc; > {code} > but the COMPLETED_COMPACTIONS table has no index. In this case it's resulting > in a full table scan over 115k rows, which takes around 100ms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-22910) CBO fails when subquery with rank left joined
[ https://issues.apache.org/jira/browse/HIVE-22910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-22910. --- Resolution: Duplicate HIVE-22824 > CBO fails when subquery with rank left joined > - > > Key: HIVE-22910 > URL: https://issues.apache.org/jira/browse/HIVE-22910 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > *Repro* > {code} > CREATE TABLE table1(a int, b int); > ANALYZE TABLE table1 COMPUTE STATISTICS FOR COLUMNS; > EXPLAIN CBO > SELECT sub1.r FROM > ( > SELECT > RANK() OVER (ORDER BY t1.b desc) as r > FROM table1 t1 > JOIN table1 t2 ON t1.a = t2.b > ) sub1 > LEFT OUTER JOIN table1 t3 > ON sub1.r = t3.a; > {code} > {code} > See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, > or check ./ql/target/surefire-reports or > ./itests/qtest/target/surefire-reports/ for specific test cases logs. > org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column > reference 'b': (possible column names are: $hdt$_0.a, $hdt$_0.b, $hdt$_1.b) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:13089) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:13031) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12999) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinKeys(SemanticAnalyzer.java:9248) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinOperator(SemanticAnalyzer.java:9409) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinPlan(SemanticAnalyzer.java:9624) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11781) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11661) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:534) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12547) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:361) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:183) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:594) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:540) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:534) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:249) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:193) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:415) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:346) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:709) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:679) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:169) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:59) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at >
[jira] [Updated] (HIVE-22920) Add row format OpenCSVSerde to the metastore column managed list
[ https://issues.apache.org/jira/browse/HIVE-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan updated HIVE-22920: Attachment: HIVE-22920.2.patch Status: Patch Available (was: Open) > Add row format OpenCSVSerde to the metastore column managed list > > > Key: HIVE-22920 > URL: https://issues.apache.org/jira/browse/HIVE-22920 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22920.1.patch, HIVE-22920.2.patch > > > Add row format OpenCSVSerde to the metastore column managed list -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22920) Add row format OpenCSVSerde to the metastore column managed list
[ https://issues.apache.org/jira/browse/HIVE-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan updated HIVE-22920: Attachment: (was: HIVE-22920.2.patch) > Add row format OpenCSVSerde to the metastore column managed list > > > Key: HIVE-22920 > URL: https://issues.apache.org/jira/browse/HIVE-22920 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22920.1.patch > > > Add row format OpenCSVSerde to the metastore column managed list -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22920) Add row format OpenCSVSerde to the metastore column managed list
[ https://issues.apache.org/jira/browse/HIVE-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan updated HIVE-22920: Status: Open (was: Patch Available) > Add row format OpenCSVSerde to the metastore column managed list > > > Key: HIVE-22920 > URL: https://issues.apache.org/jira/browse/HIVE-22920 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22920.1.patch > > > Add row format OpenCSVSerde to the metastore column managed list -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22824) JoinProjectTranspose rule should skip Projects containing windowing expression
[ https://issues.apache.org/jira/browse/HIVE-22824?focusedWorklogId=392747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392747 ] ASF GitHub Bot logged work on HIVE-22824: - Author: ASF GitHub Bot Created on: 25/Feb/20 18:08 Start Date: 25/Feb/20 18:08 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on pull request #897: HIVE-22824: JoinProjectTranspose rule should skip Projects containing… URL: https://github.com/apache/hive/pull/897#discussion_r384034467 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -487,7 +483,7 @@ Operator genOPTree(PlannerContext plannerCtx) throws SemanticException { ASTNode newAST = getOptimizedAST(newPlan); // 1.1. Fix up the query for insert/ctas/materialized views -newAST = fixUpAfterCbo(this.getAST(), newAST, cboCtx); Review comment: @kgyrtkirk The original issue for which HIVE-22578 was opened is being fixed by HIVE-22824 (this pull request's change). CBO path was failing because JoinProjectTranspose rule was removing project containing windowing (creating wrong AST). Fall to non-cbo path should happen only for queries for which CBO isn't supported (and that will happed before fixUpAfterCbo). So I believe it is okay to change AST at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392747) Time Spent: 0.5h (was: 20m) > JoinProjectTranspose rule should skip Projects containing windowing expression > -- > > Key: HIVE-22824 > URL: https://issues.apache.org/jira/browse/HIVE-22824 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22824.1.patch, HIVE-22824.2.patch, > HIVE-22824.3.patch, HIVE-22824.4.patch, HIVE-22824.5.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Otherwise this rule could end up creating plan with windowing expression > within join condition which hive doesn't know how to process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22819) Refactor Hive::listFilesCreatedByQuery to make it faster for object stores
[ https://issues.apache.org/jira/browse/HIVE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044703#comment-17044703 ] Hive QA commented on HIVE-22819: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994555/HIVE-22819.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 18044 tests executed *Failed tests:* {noformat} TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=161) [unionDistinct_1.q,table_nonprintable.q,file_with_header_footer_aggregation.q,orc_llap_counters1.q,mm_cttas.q,whroot_external1.q,global_limit.q,rcfile_createas1.q,dynamic_partition_pruning_2.q,intersect_merge.q,parquet_struct_type_vectorization.q,results_cache_diff_fs.q,parallel_colstats.q,load_hdfs_file_with_space_in_the_name.q,orc_merge3.q] org.apache.hadoop.hive.metastore.TestMetastoreHousekeepingLeaderEmptyConfig.testHouseKeepingThreadExistence (batchId=251) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20824/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20824/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20824/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12994555 - PreCommit-HIVE-Build > Refactor Hive::listFilesCreatedByQuery to make it faster for object stores > -- > > Key: HIVE-22819 > URL: https://issues.apache.org/jira/browse/HIVE-22819 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22819.1.patch, HIVE-22819.2.patch, > HIVE-22819.3.patch, HIVE-22819.4.patch, HIVE-22819.5.patch > > > {color:#ff}Hive::listFilesCreatedByQuery{color} does an exists(), an > isDir() and then a listing call. This can be expensive in object stores. We > should instead directly list the files in the directory (we'd have to handle > an exception if the directory does not exists, but issuing a single call to > the object store would most likely still end up being more performant). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22453) Describe table unnecessarily fetches partitions
[ https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044699#comment-17044699 ] Vineet Garg commented on HIVE-22453: [~touchida] For some reason I am unable to apply the patch on upstream master. Can you rebase and re-upload? > Describe table unnecessarily fetches partitions > --- > > Key: HIVE-22453 > URL: https://issues.apache.org/jira/browse/HIVE-22453 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2, 2.3.6 >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Minor > Attachments: HIVE-22453.2.patch, HIVE-22453.2.patch, > HIVE-22453.3.patch, HIVE-22453.patch > > > The simple describe table command without EXTENDED and FORMATTED (i.e., > DESCRIBE table_name) fetches all partitions when no partition is specified, > although it does not display partition statistics in nature. > The command should not fetch partitions since it can take a long time for a > large amount of partitions. > For instance, in our environment, the command takes around 8 seconds for a > table with 8760 (24 * 365) partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22771) Partition location incorrectly formed in FileOutputCommitterContainer
[ https://issues.apache.org/jira/browse/HIVE-22771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-22771: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Fix has been pushed to master. Thank you for the contribution [~shivam-mohan] > Partition location incorrectly formed in FileOutputCommitterContainer > - > > Key: HIVE-22771 > URL: https://issues.apache.org/jira/browse/HIVE-22771 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: Shivam >Assignee: Shivam >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22771.2.patch, HIVE-22771.3.patch, > HIVE-22771.4.patch, HIVE-22771.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Class _HCatOutputFormat_ in package _org.apache.hive.hcatalog.mapreduce_ uses > function _setOutput_ to generate _idHash_ using below statement: > *+In file org/apache/hive/hcatalog/mapreduce/HCatOutputFormat.java+* > *line 116: idHash = String.valueOf(Math.random());* > The output of idHash can be similar to values like this : 7.145347157239135E-4 > > And, in class _FileOutputCommitterContainer_ in package > _org.apache.hive.hcatalog.mapreduce;_ > Uses below statement to compute final partition path: > +*In org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java*+ > *line 366: String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + > SCRATCH_DIR_NAME + "{color:#ff}\\d > .? > d+"{color},"");* > *line 367: partPath = new Path(finalLocn);* > > Regex used here is incorrect, since it will only remove integers after the > *SCRATCH_DIR_NAME,* and hence will append 'E-4' (for the above example) in > the final partition location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22889) Trim trailing and leading quotes for HCatCli query processing
[ https://issues.apache.org/jira/browse/HIVE-22889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-22889: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Fix has been committed to master. Thank you for the patch [~rameshkumar] > Trim trailing and leading quotes for HCatCli query processing > - > > Key: HIVE-22889 > URL: https://issues.apache.org/jira/browse/HIVE-22889 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22889.1.patch > > > Trim trailing and leading quotes for HCatCli query processing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22819) Refactor Hive::listFilesCreatedByQuery to make it faster for object stores
[ https://issues.apache.org/jira/browse/HIVE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044652#comment-17044652 ] Hive QA commented on HIVE-22819: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 6s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 16s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20824/dev-support/hive-personality.sh | | git revision | master / 1046517 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20824/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20824/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Refactor Hive::listFilesCreatedByQuery to make it faster for object stores > -- > > Key: HIVE-22819 > URL: https://issues.apache.org/jira/browse/HIVE-22819 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22819.1.patch, HIVE-22819.2.patch, > HIVE-22819.3.patch, HIVE-22819.4.patch, HIVE-22819.5.patch > > > {color:#ff}Hive::listFilesCreatedByQuery{color} does an exists(), an > isDir() and then a listing call. This can be expensive in object stores. We > should instead directly list the files in the directory (we'd have to handle > an exception if the directory does not exists, but issuing a single call to > the object store would most likely still end up being more performant). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-20948) Eliminate file rename in compactor
[ https://issues.apache.org/jira/browse/HIVE-20948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044626#comment-17044626 ] Hive QA commented on HIVE-20948: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994554/HIVE-20948.07.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18059 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20823/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20823/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20823/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12994554 - PreCommit-HIVE-Build > Eliminate file rename in compactor > -- > > Key: HIVE-20948 > URL: https://issues.apache.org/jira/browse/HIVE-20948 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: László Pintér >Priority: Major > Attachments: HIVE-20948.01.patch, HIVE-20948.02.patch, > HIVE-20948.03.patch, HIVE-20948.04.patch, HIVE-20948.05.patch, > HIVE-20948.06.patch, HIVE-20948.07.patch > > > Once HIVE-20823 is committed, we should investigate if it's possible to have > compactor write directly to base_x_cZ or delta_x_y_cZ. > For query based compaction: can we control location of temp table dir? We > support external temp tables so this may work but we'd need to have non-acid > insert create files with {{bucket_x}} names. > > For MR/Tez/LLAP based (should this be done at all?), need to figure out how > retries of tasks will work. Just like we currently generate an MR job to > compact, we should be able to generate a Tez job. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22918) Investigate empty bucket file creation for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044598#comment-17044598 ] Marton Bod commented on HIVE-22918: --- Based on discussions with [~lpinter], in theory, the lack of empty bucket files should pose no problem for compaction either. > Investigate empty bucket file creation for ACID tables > -- > > Key: HIVE-22918 > URL: https://issues.apache.org/jira/browse/HIVE-22918 > Project: Hive > Issue Type: Task >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marton Bod >Priority: Major > > When creating an insert-only bucketed table with 5 buckets, and we insert > only one row to this table, Hive creates empty files for the other 4 buckets. > This logic is in the code for ACID tables as well, but when checking the > table's final directory after the insert, I found that only 1 files got > created. When debugged this issue, I found that the empty files are created > in the staging directory outside the delta directory, therefore they won't > get copied by the move task to the final directory. This behavior seems > broken, but not sure if we really need the empty files in this case. > This Jira is about investigating whether or not we need these empty files for > ACID tables and if we do, fix the code to have them for ACID tables as well. > Repro steps: > {noformat} > create table test_mm(key int, id int) clustered by (key) into 5 buckets > stored as orc tblproperties("transactional"="true", > "transactional_properties"="insert_only"); > insert into test_mm values (1,1); > {noformat} > The following files are present in the 'test_mm/delta_001_001_' > folder: > {noformat} > 244 Feb 21 12:08 00_0 > 0 Feb 21 12:08 01_0 > 0 Feb 21 12:08 02_0 > 0 Feb 21 12:08 03_0 > 0 Feb 21 12:08 04_0 > {noformat} > {noformat} > create table test_acid(key int, id int) clustered by (key) into 5 buckets > stored as orc tblproperties("transactional"="true"); > insert into test_acid values (1,1); > {noformat} > The following files are present in the 'test_acid/delta_001_001_' > folder: > {noformat} > 1 Feb 21 12:13 _orc_acid_version > 656 Feb 21 12:13 bucket_0 > {noformat} > However when stopping in the MoveTask with the debugger, it can be seen that > the staging directory contains the empty files, so they are generated. > However the 00_0 is not a file, it is a directory which contains the > delta directory and the data file. When moving the data file to the final > location, the move task will only move the files from the delta directory, so > the empty files won't be moved. > {noformat} > ll > test_acid/.hive-staging_hive_2020-02-21_12-16-58_615_787573577176141305-1/-ext-1 > > 96 Feb 21 12:17 00_0 > 0 Feb 21 12:17 01_0 > 0 Feb 21 12:17 02_0 > 0 Feb 21 12:17 03_0 > 0 Feb 21 12:17 04_0 > {noformat} > {noformat} > ll > test_acid/.hive-staging_hive_2020-02-21_12-16-58_615_787573577176141305-1/-ext-1/00_0/delta_001_001_ > > 1 Feb 21 12:17 _orc_acid_version > 656 Feb 21 12:17 bucket_0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22918) Investigate empty bucket file creation for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044586#comment-17044586 ] Marton Bod commented on HIVE-22918: --- Based on my investigation, the empty bucket files are only created if MR is used as the execution engine. When using Tez, the empty files are not created - not for MM tables and not in the staging directory either during ACID (non-direct) inserts. Additionally, when testing locally using MR as the engine, the empty bucket files created for MM tables did not seem to play any role - upon deleting them manually, the data could still be read back and compaction worked as well. It seems that their creation is most likely a side-effect/bug of how MR works under the hood. In conclusion, my suggestion would be to: # keep the ACID logic as is, i.e. do not add the empty file creation logic to ACID, since it seems to be an MR-only phenomenon (which is a deprecated engine anyway) # investigate whether the empty file creation could be removed from MR as well - for users of MR, the cost of creating empty files can be expensive for tables with many buckets. E.g. when using a table with 1024 buckets, inserting a single record will create 1023 empty files every time, slowing down the query execution considerably > Investigate empty bucket file creation for ACID tables > -- > > Key: HIVE-22918 > URL: https://issues.apache.org/jira/browse/HIVE-22918 > Project: Hive > Issue Type: Task >Affects Versions: 4.0.0 >Reporter: Marta Kuczora >Assignee: Marton Bod >Priority: Major > > When creating an insert-only bucketed table with 5 buckets, and we insert > only one row to this table, Hive creates empty files for the other 4 buckets. > This logic is in the code for ACID tables as well, but when checking the > table's final directory after the insert, I found that only 1 files got > created. When debugged this issue, I found that the empty files are created > in the staging directory outside the delta directory, therefore they won't > get copied by the move task to the final directory. This behavior seems > broken, but not sure if we really need the empty files in this case. > This Jira is about investigating whether or not we need these empty files for > ACID tables and if we do, fix the code to have them for ACID tables as well. > Repro steps: > {noformat} > create table test_mm(key int, id int) clustered by (key) into 5 buckets > stored as orc tblproperties("transactional"="true", > "transactional_properties"="insert_only"); > insert into test_mm values (1,1); > {noformat} > The following files are present in the 'test_mm/delta_001_001_' > folder: > {noformat} > 244 Feb 21 12:08 00_0 > 0 Feb 21 12:08 01_0 > 0 Feb 21 12:08 02_0 > 0 Feb 21 12:08 03_0 > 0 Feb 21 12:08 04_0 > {noformat} > {noformat} > create table test_acid(key int, id int) clustered by (key) into 5 buckets > stored as orc tblproperties("transactional"="true"); > insert into test_acid values (1,1); > {noformat} > The following files are present in the 'test_acid/delta_001_001_' > folder: > {noformat} > 1 Feb 21 12:13 _orc_acid_version > 656 Feb 21 12:13 bucket_0 > {noformat} > However when stopping in the MoveTask with the debugger, it can be seen that > the staging directory contains the empty files, so they are generated. > However the 00_0 is not a file, it is a directory which contains the > delta directory and the data file. When moving the data file to the final > location, the move task will only move the files from the delta directory, so > the empty files won't be moved. > {noformat} > ll > test_acid/.hive-staging_hive_2020-02-21_12-16-58_615_787573577176141305-1/-ext-1 > > 96 Feb 21 12:17 00_0 > 0 Feb 21 12:17 01_0 > 0 Feb 21 12:17 02_0 > 0 Feb 21 12:17 03_0 > 0 Feb 21 12:17 04_0 > {noformat} > {noformat} > ll > test_acid/.hive-staging_hive_2020-02-21_12-16-58_615_787573577176141305-1/-ext-1/00_0/delta_001_001_ > > 1 Feb 21 12:17 _orc_acid_version > 656 Feb 21 12:17 bucket_0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-20948) Eliminate file rename in compactor
[ https://issues.apache.org/jira/browse/HIVE-20948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044581#comment-17044581 ] Hive QA commented on HIVE-20948: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 4s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 16s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20823/dev-support/hive-personality.sh | | git revision | master / 1046517 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20823/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20823/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Eliminate file rename in compactor > -- > > Key: HIVE-20948 > URL: https://issues.apache.org/jira/browse/HIVE-20948 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: László Pintér >Priority: Major > Attachments: HIVE-20948.01.patch, HIVE-20948.02.patch, > HIVE-20948.03.patch, HIVE-20948.04.patch, HIVE-20948.05.patch, > HIVE-20948.06.patch, HIVE-20948.07.patch > > > Once HIVE-20823 is committed, we should investigate if it's possible to have > compactor write directly to base_x_cZ or delta_x_y_cZ. > For query based compaction: can we control location of temp table dir? We > support external temp tables so this may work but we'd need to have non-acid > insert create files with {{bucket_x}} names. > > For MR/Tez/LLAP based (should this be done at all?), need to figure out how > retries of tasks will work. Just like we currently generate an MR job to > compact, we should be able to generate a Tez job. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Description: In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an descending order but the filter wants the top N smallest elements the filter will forward everything. Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high. was: In certain cases the TopNKey filter might work in an inefficient way and adds extra CPU overhead. For example if the rows are coming in an ascending order but the filter wants the top N smallest elements the filter will forward everything. Inefficient should be detected in runtime so that the filter can be disabled of the ration between forwarder_rows/total_rows is too high. > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an descending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22819) Refactor Hive::listFilesCreatedByQuery to make it faster for object stores
[ https://issues.apache.org/jira/browse/HIVE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044556#comment-17044556 ] Peter Vary commented on HIVE-22819: --- +1 > Refactor Hive::listFilesCreatedByQuery to make it faster for object stores > -- > > Key: HIVE-22819 > URL: https://issues.apache.org/jira/browse/HIVE-22819 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22819.1.patch, HIVE-22819.2.patch, > HIVE-22819.3.patch, HIVE-22819.4.patch, HIVE-22819.5.patch > > > {color:#ff}Hive::listFilesCreatedByQuery{color} does an exists(), an > isDir() and then a listing call. This can be expensive in object stores. We > should instead directly list the files in the directory (we'd have to handle > an exception if the directory does not exists, but issuing a single call to > the object store would most likely still end up being more performant). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Attachment: HIVE-22925.1.patch > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Status: Open (was: Patch Available) > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Status: Patch Available (was: Open) > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Attachment: (was: HIVE-22925.1.patch) > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Status: Patch Available (was: Open) > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Attachment: HIVE-22925.1.patch > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Attachment: (was: HIVE-22925.1.patch) > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Status: Open (was: Patch Available) > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Status: Patch Available (was: Open) > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22925) Implement TopNKeyFilter efficiency check
[ https://issues.apache.org/jira/browse/HIVE-22925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-22925: - Attachment: HIVE-22925.1.patch > Implement TopNKeyFilter efficiency check > > > Key: HIVE-22925 > URL: https://issues.apache.org/jira/browse/HIVE-22925 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22925.1.patch > > > In certain cases the TopNKey filter might work in an inefficient way and adds > extra CPU overhead. For example if the rows are coming in an ascending order > but the filter wants the top N smallest elements the filter will forward > everything. > Inefficient should be detected in runtime so that the filter can be disabled > of the ration between forwarder_rows/total_rows is too high. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22832) Parallelise direct insert directory cleaning process
[ https://issues.apache.org/jira/browse/HIVE-22832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod updated HIVE-22832: -- Attachment: HIVE-22832.6.patch > Parallelise direct insert directory cleaning process > > > Key: HIVE-22832 > URL: https://issues.apache.org/jira/browse/HIVE-22832 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22832.1.patch, HIVE-22832.2.patch, > HIVE-22832.3.patch, HIVE-22832.4.patch, HIVE-22832.5.patch, HIVE-22832.6.patch > > > Inside Utilities::handleDirectInsertTableFinalPath, the > cleanDirectInsertDirectories method is called sequentially for each element > of the directInsertDirectories list, which might have a large number of > elements depending on how many partitions were written. This current > sequential execution could be improved by parallelising the clean up process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21487) COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes
[ https://issues.apache.org/jira/browse/HIVE-21487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér updated HIVE-21487: - Attachment: (was: HIVE-21487.06.patch) > COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes > > > Key: HIVE-21487 > URL: https://issues.apache.org/jira/browse/HIVE-21487 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Todd Lipcon >Assignee: László Pintér >Priority: Major > Attachments: HIVE-21487.03.patch, HIVE-21487.04.patch, > HIVE-21487.05.patch, HIVE-21487.06.patch, HIVE-21847.01.patch, > HIVE-21847.02.patch > > > Looking at a MySQL install where HMS is pointed on Hive 3.1, I see a constant > stream of queries of the form: > {code} > select CC_STATE from COMPLETED_COMPACTIONS where CC_DATABASE = > 'tpcds_orc_exact_1000' and CC_TABLE = 'catalog_returns' and CC_PARTITION = > 'cr_returned_date_sk=2452851' and CC_STATE != 'a' order by CC_ID desc; > {code} > but the COMPLETED_COMPACTIONS table has no index. In this case it's resulting > in a full table scan over 115k rows, which takes around 100ms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21487) COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes
[ https://issues.apache.org/jira/browse/HIVE-21487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér updated HIVE-21487: - Attachment: HIVE-21487.06.patch > COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes > > > Key: HIVE-21487 > URL: https://issues.apache.org/jira/browse/HIVE-21487 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Todd Lipcon >Assignee: László Pintér >Priority: Major > Attachments: HIVE-21487.03.patch, HIVE-21487.04.patch, > HIVE-21487.05.patch, HIVE-21487.06.patch, HIVE-21847.01.patch, > HIVE-21847.02.patch > > > Looking at a MySQL install where HMS is pointed on Hive 3.1, I see a constant > stream of queries of the form: > {code} > select CC_STATE from COMPLETED_COMPACTIONS where CC_DATABASE = > 'tpcds_orc_exact_1000' and CC_TABLE = 'catalog_returns' and CC_PARTITION = > 'cr_returned_date_sk=2452851' and CC_STATE != 'a' order by CC_ID desc; > {code} > but the COMPLETED_COMPACTIONS table has no index. In this case it's resulting > in a full table scan over 115k rows, which takes around 100ms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21487) COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes
[ https://issues.apache.org/jira/browse/HIVE-21487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér updated HIVE-21487: - Attachment: HIVE-21487.06.patch > COMPLETED_COMPACTIONS and COMPACTION_QUEUE table missing appropriate indexes > > > Key: HIVE-21487 > URL: https://issues.apache.org/jira/browse/HIVE-21487 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Todd Lipcon >Assignee: László Pintér >Priority: Major > Attachments: HIVE-21487.03.patch, HIVE-21487.04.patch, > HIVE-21487.05.patch, HIVE-21487.06.patch, HIVE-21847.01.patch, > HIVE-21847.02.patch > > > Looking at a MySQL install where HMS is pointed on Hive 3.1, I see a constant > stream of queries of the form: > {code} > select CC_STATE from COMPLETED_COMPACTIONS where CC_DATABASE = > 'tpcds_orc_exact_1000' and CC_TABLE = 'catalog_returns' and CC_PARTITION = > 'cr_returned_date_sk=2452851' and CC_STATE != 'a' order by CC_ID desc; > {code} > but the COMPLETED_COMPACTIONS table has no index. In this case it's resulting > in a full table scan over 115k rows, which takes around 100ms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22819) Refactor Hive::listFilesCreatedByQuery to make it faster for object stores
[ https://issues.apache.org/jira/browse/HIVE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod updated HIVE-22819: -- Attachment: HIVE-22819.5.patch > Refactor Hive::listFilesCreatedByQuery to make it faster for object stores > -- > > Key: HIVE-22819 > URL: https://issues.apache.org/jira/browse/HIVE-22819 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22819.1.patch, HIVE-22819.2.patch, > HIVE-22819.3.patch, HIVE-22819.4.patch, HIVE-22819.5.patch > > > {color:#ff}Hive::listFilesCreatedByQuery{color} does an exists(), an > isDir() and then a listing call. This can be expensive in object stores. We > should instead directly list the files in the directory (we'd have to handle > an exception if the directory does not exists, but issuing a single call to > the object store would most likely still end up being more performant). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-20948) Eliminate file rename in compactor
[ https://issues.apache.org/jira/browse/HIVE-20948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér updated HIVE-20948: - Attachment: HIVE-20948.07.patch > Eliminate file rename in compactor > -- > > Key: HIVE-20948 > URL: https://issues.apache.org/jira/browse/HIVE-20948 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: László Pintér >Priority: Major > Attachments: HIVE-20948.01.patch, HIVE-20948.02.patch, > HIVE-20948.03.patch, HIVE-20948.04.patch, HIVE-20948.05.patch, > HIVE-20948.06.patch, HIVE-20948.07.patch > > > Once HIVE-20823 is committed, we should investigate if it's possible to have > compactor write directly to base_x_cZ or delta_x_y_cZ. > For query based compaction: can we control location of temp table dir? We > support external temp tables so this may work but we'd need to have non-acid > insert create files with {{bucket_x}} names. > > For MR/Tez/LLAP based (should this be done at all?), need to figure out how > retries of tasks will work. Just like we currently generate an MR job to > compact, we should be able to generate a Tez job. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22904) Compaction cleaner cannot find COMPACTION_QUEUE table using postgres db
[ https://issues.apache.org/jira/browse/HIVE-22904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér updated HIVE-22904: - Resolution: Fixed Status: Resolved (was: Patch Available) > Compaction cleaner cannot find COMPACTION_QUEUE table using postgres db > --- > > Key: HIVE-22904 > URL: https://issues.apache.org/jira/browse/HIVE-22904 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Attachments: HIVE-22904.01.patch, HIVE-22904.02.patch, > HIVE-22904.03.patch, HIVE-22904.04.patch, HIVE-22904.05.patch > > > In CompactionTxnHandler > {code:java} > delete from COMPACTION_QUEUE where cq_id = ? > {code} > fails with > {code:java} > org.postgresql.util.PSQLException: ERROR: relation "compaction_queue" does > not exist > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22904) Compaction cleaner cannot find COMPACTION_QUEUE table using postgres db
[ https://issues.apache.org/jira/browse/HIVE-22904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044495#comment-17044495 ] László Pintér commented on HIVE-22904: -- Pushed to master. Thanks [~zchovan] and [~pvary] for the review. > Compaction cleaner cannot find COMPACTION_QUEUE table using postgres db > --- > > Key: HIVE-22904 > URL: https://issues.apache.org/jira/browse/HIVE-22904 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Attachments: HIVE-22904.01.patch, HIVE-22904.02.patch, > HIVE-22904.03.patch, HIVE-22904.04.patch, HIVE-22904.05.patch > > > In CompactionTxnHandler > {code:java} > delete from COMPACTION_QUEUE where cq_id = ? > {code} > fails with > {code:java} > org.postgresql.util.PSQLException: ERROR: relation "compaction_queue" does > not exist > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22819) Refactor Hive::listFilesCreatedByQuery to make it faster for object stores
[ https://issues.apache.org/jira/browse/HIVE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044471#comment-17044471 ] Steve Loughran commented on HIVE-22819: --- LGTM -this saves two round trips to HDFS, S3 or ABFS. > Refactor Hive::listFilesCreatedByQuery to make it faster for object stores > -- > > Key: HIVE-22819 > URL: https://issues.apache.org/jira/browse/HIVE-22819 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22819.1.patch, HIVE-22819.2.patch, > HIVE-22819.3.patch, HIVE-22819.4.patch > > > {color:#ff}Hive::listFilesCreatedByQuery{color} does an exists(), an > isDir() and then a listing call. This can be expensive in object stores. We > should instead directly list the files in the directory (we'd have to handle > an exception if the directory does not exists, but issuing a single call to > the object store would most likely still end up being more performant). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22853) Beeline should use HS2 server defaults for fetchSize
[ https://issues.apache.org/jira/browse/HIVE-22853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044468#comment-17044468 ] David Mollitor commented on HIVE-22853: --- Same thing in HiveConnection.java, the fetchSize should default to 0. > Beeline should use HS2 server defaults for fetchSize > > > Key: HIVE-22853 > URL: https://issues.apache.org/jira/browse/HIVE-22853 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Attachments: HIVE-22853.2.patch, HIVE-22853.3.patch, HIVE-22853.patch > > > Currently beeline uses a hard coded default of 1000 rows for fetchSize. This > default value is different from what the server has set. While the beeline > user can reset the value via set command, its cumbersome to change the > workloads. > Rather it should default to the server-side value and set should be used to > override within the session. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22853) Beeline should use HS2 server defaults for fetchSize
[ https://issues.apache.org/jira/browse/HIVE-22853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044466#comment-17044466 ] David Mollitor commented on HIVE-22853: --- {quote} if we follow the same spec as Oracle, we should treat less than 0 as invalid, and default to 0 and treat zero as no-limit. {quote} I do not think the is correct. From the docs: _If the value specified is zero, then the hint is ignored._ I think this means simply that the driver can use whatever default value it wants since the client application has not provided any kind of hint. So, actually the 'default' value should be 0 in BeeLine.java and should be treated as 1000 (current default) in the Driver itself... this behavior is correctly implemented in the driver: https://github.com/apache/hive/blob/037eacea46371015a7f9894c5a9ccfb9708d5c56/jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java#L811 > Beeline should use HS2 server defaults for fetchSize > > > Key: HIVE-22853 > URL: https://issues.apache.org/jira/browse/HIVE-22853 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Attachments: HIVE-22853.2.patch, HIVE-22853.3.patch, HIVE-22853.patch > > > Currently beeline uses a hard coded default of 1000 rows for fetchSize. This > default value is different from what the server has set. While the beeline > user can reset the value via set command, its cumbersome to change the > workloads. > Rather it should default to the server-side value and set should be used to > override within the session. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22927) LLAP should filter guaranteed tasks before killing in node heartbeat
[ https://issues.apache.org/jira/browse/HIVE-22927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-22927: Summary: LLAP should filter guaranteed tasks before killing in node heartbeat (was: LLAP should filter guaranteed tasks for killing in node heartbeat ) > LLAP should filter guaranteed tasks before killing in node heartbeat > - > > Key: HIVE-22927 > URL: https://issues.apache.org/jira/browse/HIVE-22927 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22927.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22832) Parallelise direct insert directory cleaning process
[ https://issues.apache.org/jira/browse/HIVE-22832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044420#comment-17044420 ] Hive QA commented on HIVE-22832: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994541/HIVE-22832.5.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20822/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20822/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20822/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2020-02-25 13:08:16.695 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-20822/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2020-02-25 13:08:16.698 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 0767c5d..bbdf4c3 master -> origin/master + git reset --hard HEAD HEAD is now at 0767c5d HIVE-22825 : Reduce directory lookup cost for acid tables (Rajesh Balamohan via Ashutosh Chauhan) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at bbdf4c3 HIVE-22863: Commit compaction txn if it is opened but compaction is skipped (Karen Coppage via Laszlo Pinter) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2020-02-25 13:08:18.435 + rm -rf ../yetus_PreCommit-HIVE-Build-20822 + mkdir ../yetus_PreCommit-HIVE-Build-20822 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-20822 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-20822/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Trying to apply the patch with -p0 fatal: corrupt patch at line 87 Trying to apply the patch with -p1 fatal: corrupt patch at line 87 Trying to apply the patch with -p2 fatal: corrupt patch at line 87 The patch does not appear to apply with p0, p1, or p2 + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-20822 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12994541 - PreCommit-HIVE-Build > Parallelise direct insert directory cleaning process > > > Key: HIVE-22832 > URL: https://issues.apache.org/jira/browse/HIVE-22832 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22832.1.patch, HIVE-22832.2.patch, > HIVE-22832.3.patch, HIVE-22832.4.patch, HIVE-22832.5.patch > > > Inside Utilities::handleDirectInsertTableFinalPath, the > cleanDirectInsertDirectories method is called sequentially for each element > of the directInsertDirectories list, which might have a large number of > elements depending on how many partitions were written. This current > sequential execution could be improved by parallelising the clean up process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22872) Support multiple executors for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044418#comment-17044418 ] Hive QA commented on HIVE-22872: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12994526/HIVE-22872.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 37 failed/errored test(s), 18057 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schq_analyze] (batchId=175) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schq_ingest] (batchId=184) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schq_materialized] (batchId=182) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=176) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb_schq] (batchId=181) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query70] (batchId=305) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testCleanup[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testCleanup[Remote] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testCreate[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testCreate[Remote] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testDuplicateCreate[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testDuplicateCreate[Remote] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testExclusivePoll[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testExclusivePoll[Remote] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testNormalDeleteWithExec[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testNormalDeleteWithExec[Remote] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testNormalDelete[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testNormalDelete[Remote] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testOutdatedCleanup[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testOutdatedCleanup[Remote] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testPoll[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testPoll[Remote] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testUpdate[Embedded] (batchId=231) org.apache.hadoop.hive.metastore.client.TestMetastoreScheduledQueries.testUpdate[Remote] (batchId=231) org.apache.hadoop.hive.ql.schq.TestScheduledQueryService.testScheduledQueryExecution (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.test10Minutes (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.test10Seconds (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.test4Hours (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.test4Hours2 (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.testAlter (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.testCreateFromNonDefaultDatabase (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.testDay (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.testDay2 (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.testExecuteImmediate (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.testMinutes (batchId=357) org.apache.hadoop.hive.ql.schq.TestScheduledQueryStatements.testSimpleCreate (batchId=357) org.apache.hadoop.hive.schq.TestScheduledQueryIntegration.testScheduledQueryExecutionImpersonation (batchId=285) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20821/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20821/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20821/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 37 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12994526 - PreCommit-HIVE-Build > Support
[jira] [Commented] (HIVE-22863) Commit compaction txn if it is opened but compaction is skipped
[ https://issues.apache.org/jira/browse/HIVE-22863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044416#comment-17044416 ] László Pintér commented on HIVE-22863: -- Pushed to master. Thanks for the patch [~klcopp] > Commit compaction txn if it is opened but compaction is skipped > --- > > Key: HIVE-22863 > URL: https://issues.apache.org/jira/browse/HIVE-22863 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-22863.01.patch, HIVE-22863.02.patch, > HIVE-22863.03.patch, HIVE-22863.04.patch, HIVE-22863.05.patch > > > Currently if a table does not have enough directories to compact, compaction > is skipped and the compaction is either (a) marked ready for cleaning or (b) > marked compacted. However, the txn the compaction runs in is never committed, > it remains open, so TXNS and TXN_COMPONENTS will never be cleared of > information about the attempted compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22863) Commit compaction txn if it is opened but compaction is skipped
[ https://issues.apache.org/jira/browse/HIVE-22863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér updated HIVE-22863: - Resolution: Fixed Status: Resolved (was: Patch Available) > Commit compaction txn if it is opened but compaction is skipped > --- > > Key: HIVE-22863 > URL: https://issues.apache.org/jira/browse/HIVE-22863 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-22863.01.patch, HIVE-22863.02.patch, > HIVE-22863.03.patch, HIVE-22863.04.patch, HIVE-22863.05.patch > > > Currently if a table does not have enough directories to compact, compaction > is skipped and the compaction is either (a) marked ready for cleaning or (b) > marked compacted. However, the txn the compaction runs in is never committed, > it remains open, so TXNS and TXN_COMPONENTS will never be cleared of > information about the attempted compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22832) Parallelise direct insert directory cleaning process
[ https://issues.apache.org/jira/browse/HIVE-22832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod updated HIVE-22832: -- Attachment: HIVE-22832.5.patch > Parallelise direct insert directory cleaning process > > > Key: HIVE-22832 > URL: https://issues.apache.org/jira/browse/HIVE-22832 > Project: Hive > Issue Type: Improvement >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Attachments: HIVE-22832.1.patch, HIVE-22832.2.patch, > HIVE-22832.3.patch, HIVE-22832.4.patch, HIVE-22832.5.patch > > > Inside Utilities::handleDirectInsertTableFinalPath, the > cleanDirectInsertDirectories method is called sequentially for each element > of the directInsertDirectories list, which might have a large number of > elements depending on how many partitions were written. This current > sequential execution could be improved by parallelising the clean up process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22872) Support multiple executors for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044398#comment-17044398 ] Hive QA commented on HIVE-22872: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 3s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 43s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 53s{color} | {color:blue} standalone-metastore/metastore-common in master has 35 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 37s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 18s{color} | {color:blue} standalone-metastore/metastore-server in master has 185 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 1s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 53s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s{color} | {color:red} standalone-metastore/metastore-server: The patch generated 1 new + 295 unchanged - 0 fixed = 296 total (was 295) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s{color} | {color:red} ql: The patch generated 2 new + 8 unchanged - 1 fixed = 10 total (was 9) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 42 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 17s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20821/dev-support/hive-personality.sh | | git revision | master / 0767c5d | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20821/yetus/diff-checkstyle-standalone-metastore_metastore-server.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20821/yetus/diff-checkstyle-ql.txt | | whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-20821/yetus/whitespace-tabs.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20821/yetus/patch-asflicense-problems.txt | | modules | C: standalone-metastore/metastore-common common metastore standalone-metastore/metastore-server ql U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20821/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Support multiple executors for scheduled queries >
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044382#comment-17044382 ] Eugene Chung commented on HIVE-22126: - [~dlavati] Shading guava for Hive also requires shading calcite modules. And it leads to changing the FQCN of calcite-avatica JDBC driver. e.g. * org.apache.calcite.jdbc.Driver -> org.apache.hive.org.apache.calcite.jdbc.Driver I stopped there cause I was not sure it's okay to change it. If changing the name of driver is just internal or test concern, I think it's okay. I have some free time these days, so I am going to investigate this again. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22824) JoinProjectTranspose rule should skip Projects containing windowing expression
[ https://issues.apache.org/jira/browse/HIVE-22824?focusedWorklogId=392505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392505 ] ASF GitHub Bot logged work on HIVE-22824: - Author: ASF GitHub Bot Created on: 25/Feb/20 11:47 Start Date: 25/Feb/20 11:47 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #897: HIVE-22824: JoinProjectTranspose rule should skip Projects containing… URL: https://github.com/apache/hive/pull/897#discussion_r383830139 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -487,7 +483,7 @@ Operator genOPTree(PlannerContext plannerCtx) throws SemanticException { ASTNode newAST = getOptimizedAST(newPlan); // 1.1. Fix up the query for insert/ctas/materialized views -newAST = fixUpAfterCbo(this.getAST(), newAST, cboCtx); Review comment: I don't see how this change will not reintroduce the issue fixed in HIVE-22578 because the "fixUpAfterCbo" makes calls to a function named replaceASTChild which changes the actual ast - and it may make it impossible to fallback to the non-cbo path This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392505) Time Spent: 20m (was: 10m) > JoinProjectTranspose rule should skip Projects containing windowing expression > -- > > Key: HIVE-22824 > URL: https://issues.apache.org/jira/browse/HIVE-22824 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22824.1.patch, HIVE-22824.2.patch, > HIVE-22824.3.patch, HIVE-22824.4.patch, HIVE-22824.5.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Otherwise this rule could end up creating plan with windowing expression > within join condition which hive doesn't know how to process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22781) Add ability to immediately execute a scheduled query
[ https://issues.apache.org/jira/browse/HIVE-22781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22781: -- Labels: pull-request-available (was: ) > Add ability to immediately execute a scheduled query > > > Key: HIVE-22781 > URL: https://issues.apache.org/jira/browse/HIVE-22781 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22781.01.patch, HIVE-22781.02.patch, > HIVE-22781.03.patch, HIVE-22781.04.patch, HIVE-22781.04.patch, > HIVE-22781.04.patch, HIVE-22781.05.patch, HIVE-22781.05.patch > > > there are some differences when the system invokes the scheduled query / the > user executes it in a shell - forcing the schedule to run might be usefull in > developing/debugging schedules > something like: > {code} > alter scheduled query a execute > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22781) Add ability to immediately execute a scheduled query
[ https://issues.apache.org/jira/browse/HIVE-22781?focusedWorklogId=392486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392486 ] ASF GitHub Bot logged work on HIVE-22781: - Author: ASF GitHub Bot Created on: 25/Feb/20 11:19 Start Date: 25/Feb/20 11:19 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #896: HIVE-22781 schq execute URL: https://github.com/apache/hive/pull/896 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392486) Remaining Estimate: 0h Time Spent: 10m > Add ability to immediately execute a scheduled query > > > Key: HIVE-22781 > URL: https://issues.apache.org/jira/browse/HIVE-22781 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22781.01.patch, HIVE-22781.02.patch, > HIVE-22781.03.patch, HIVE-22781.04.patch, HIVE-22781.04.patch, > HIVE-22781.04.patch, HIVE-22781.05.patch, HIVE-22781.05.patch > > Time Spent: 10m > Remaining Estimate: 0h > > there are some differences when the system invokes the scheduled query / the > user executes it in a shell - forcing the schedule to run might be usefull in > developing/debugging schedules > something like: > {code} > alter scheduled query a execute > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22881) Revise non-recommended Calcite api calls
[ https://issues.apache.org/jira/browse/HIVE-22881?focusedWorklogId=392485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392485 ] ASF GitHub Bot logged work on HIVE-22881: - Author: ASF GitHub Bot Created on: 25/Feb/20 11:18 Start Date: 25/Feb/20 11:18 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #919: HIVE-22881 rexutil usage URL: https://github.com/apache/hive/pull/919 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392485) Time Spent: 40m (was: 0.5h) > Revise non-recommended Calcite api calls > > > Key: HIVE-22881 > URL: https://issues.apache.org/jira/browse/HIVE-22881 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22881.01.patch, HIVE-22881.02.patch, > HIVE-22881.03.patch, HIVE-22881.03.patch, HIVE-22881.03.patch > > Time Spent: 40m > Remaining Estimate: 0h > > RexUtil.simplify* methods -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22872) Support multiple executors for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-22872?focusedWorklogId=392484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392484 ] ASF GitHub Bot logged work on HIVE-22872: - Author: ASF GitHub Bot Created on: 25/Feb/20 11:17 Start Date: 25/Feb/20 11:17 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #924: HIVE-22872 schq executors URL: https://github.com/apache/hive/pull/924 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392484) Remaining Estimate: 0h Time Spent: 10m > Support multiple executors for scheduled queries > > > Key: HIVE-22872 > URL: https://issues.apache.org/jira/browse/HIVE-22872 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22872.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22872) Support multiple executors for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22872: -- Labels: pull-request-available (was: ) > Support multiple executors for scheduled queries > > > Key: HIVE-22872 > URL: https://issues.apache.org/jira/browse/HIVE-22872 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22872.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22872) Support multiple executors for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-22872: Attachment: HIVE-22872.01.patch > Support multiple executors for scheduled queries > > > Key: HIVE-22872 > URL: https://issues.apache.org/jira/browse/HIVE-22872 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-22872.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22872) Support multiple executors for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-22872: Status: Patch Available (was: Open) > Support multiple executors for scheduled queries > > > Key: HIVE-22872 > URL: https://issues.apache.org/jira/browse/HIVE-22872 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Attachments: HIVE-22872.01.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22919) StorageBasedAuthorizationProvider does not allow create databases after changing hive.metastore.warehouse.dir
[ https://issues.apache.org/jira/browse/HIVE-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041814#comment-17041814 ] Oleksiy Sayankin edited comment on HIVE-22919 at 2/25/20 10:39 AM: --- *FIXED* *ROOT-CAUSE* The root-cause of the issue does not relate to Storage Base Authorization, it is about correct update of Hive variable {{hive.metastore.warehouse.dir}}. As one can see from the exception: {code} FAILED: HiveException org.apache.hadoop.security.AccessControlException: User testuser1(user id 5001) does not have access to hdfs:/tmp/m2/m3.db {code} Hive wants to create new database at {{hdfs:/tmp/m2}} despite the operator {{SET}} that was executed before {code} SET hive.metastore.warehouse.dir=/tmp/m3; {code} This happens because {{StorageBasedAuthorizationProvider}} has an instance of {{Warehouse}} object that has value for {{hive.metastore.warehouse.dir}}. When a user updates the {{hive.metastore.warehouse.dir}} in {{Configuration}} instance, this action does not force {{Warehouse}} object to refresh the value of {{hive.metastore.warehouse.dir}} and hence it has the old one. *SOLUTION* Add {{isWarehouseChanged()}} method to check whether {{hive.metastore.warehouse.dir}} has been changed and recreate {{Warehouse}} in {{StorageBasedAuthorizationProvider}} if yes. *EFFECTS* {{StorageBasedAuthorizationProvider}} initialization. was (Author: osayankin): *FIXED* *ROOT-CAUSE* The root-cause of the issue does not relate to Storage Base Authorization, it is about correct update of Hive variable {{hive.metastore.warehouse.dir}}. As one can see from the exception: {code} FAILED: HiveException org.apache.hadoop.security.AccessControlException: User testuser1(user id 5001) does not have access to hdfs:/tmp/m2/m3.db {code} Hive wants to create new database at {{hdfs:/tmp/m2}} despite the operator {{SET}} that was executed before {code} SET hive.metastore.warehouse.dir=/tmp/m3; {code} This happens because {{StorageBasedAuthorizationProvider}} has an instance of {{Warehouse}} object that has value for {{hive.metastore.warehouse.dir}}. When a user updates the {{hive.metastore.warehouse.dir}} in {{HiveConf}} instance, this action does not force {{Warehouse}} object to refresh the value of {{hive.metastore.warehouse.dir}} and hence it has the old one. *SOLUTION* Add {{isWarehouseChanged()}} method to check whether {{hive.metastore.warehouse.dir}} has been changed and recreate {{Warehouse}} in {{StorageBasedAuthorizationProvider}} if yes. *EFFECTS* {{StorageBasedAuthorizationProvider}} initialization. > StorageBasedAuthorizationProvider does not allow create databases after > changing hive.metastore.warehouse.dir > - > > Key: HIVE-22919 > URL: https://issues.apache.org/jira/browse/HIVE-22919 > Project: Hive > Issue Type: Bug >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22919.1.patch, HIVE-22919.2.patch, > HIVE-22919.3.patch > > > *ENVIRONMENT:* > Hive-2.3 > *STEPS TO REPRODUCE:* > 1. Configure Storage Based Authorization: > {code:xml} > hive.security.authorization.enabled > true > > > hive.security.metastore.authorization.manager > > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider > > > hive.security.authorization.manager > > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider > > > hive.security.metastore.authenticator.manager > > org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator > > > hive.metastore.pre.event.listeners > > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener > {code} > 2. Create a few directories, change owners and permissions to it: > {code:java}hadoop fs -mkdir /tmp/m1 > hadoop fs -mkdir /tmp/m2 > hadoop fs -mkdir /tmp/m3 > hadoop fs -chown testuser1:testuser1 /tmp/m[1,3] > hadoop fs -chmod 700 /tmp/m[1-3]{code} > 3. Check permissions: > {code:java}[test@node2 ~]$ hadoop fs -ls /tmp|grep m[1-3] > drwx-- - testuser1 testuser1 0 2020-02-11 10:25 /tmp/m1 > drwx-- - test test 0 2020-02-11 10:25 /tmp/m2 > drwx-- - testuser1 testuser1 1 2020-02-11 10:36 /tmp/m3 > [test@node2 ~]$ > {code} > 4. Loggin into Hive CLI using embedded Hive Metastore as *"testuser1"* user, > with *"hive.metastore.warehouse.dir"* set to *"/tmp/m1"*: > {code:java} > sudo -u testuser1 hive --hiveconf hive.metastore.uris= --hiveconf > hive.metastore.warehouse.dir=/tmp/m1 > {code} > 5. Perform the next steps: > {code:sql}-- 1. Check "hive.metastore.warehouse.dir" value: > SET hive.metastore.warehouse.dir; > -- 2. Set
[jira] [Updated] (HIVE-22919) StorageBasedAuthorizationProvider does not allow create databases after changing hive.metastore.warehouse.dir
[ https://issues.apache.org/jira/browse/HIVE-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Sayankin updated HIVE-22919: Status: Patch Available (was: In Progress) > StorageBasedAuthorizationProvider does not allow create databases after > changing hive.metastore.warehouse.dir > - > > Key: HIVE-22919 > URL: https://issues.apache.org/jira/browse/HIVE-22919 > Project: Hive > Issue Type: Bug >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22919.1.patch, HIVE-22919.2.patch, > HIVE-22919.3.patch > > > *ENVIRONMENT:* > Hive-2.3 > *STEPS TO REPRODUCE:* > 1. Configure Storage Based Authorization: > {code:xml} > hive.security.authorization.enabled > true > > > hive.security.metastore.authorization.manager > > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider > > > hive.security.authorization.manager > > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider > > > hive.security.metastore.authenticator.manager > > org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator > > > hive.metastore.pre.event.listeners > > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener > {code} > 2. Create a few directories, change owners and permissions to it: > {code:java}hadoop fs -mkdir /tmp/m1 > hadoop fs -mkdir /tmp/m2 > hadoop fs -mkdir /tmp/m3 > hadoop fs -chown testuser1:testuser1 /tmp/m[1,3] > hadoop fs -chmod 700 /tmp/m[1-3]{code} > 3. Check permissions: > {code:java}[test@node2 ~]$ hadoop fs -ls /tmp|grep m[1-3] > drwx-- - testuser1 testuser1 0 2020-02-11 10:25 /tmp/m1 > drwx-- - test test 0 2020-02-11 10:25 /tmp/m2 > drwx-- - testuser1 testuser1 1 2020-02-11 10:36 /tmp/m3 > [test@node2 ~]$ > {code} > 4. Loggin into Hive CLI using embedded Hive Metastore as *"testuser1"* user, > with *"hive.metastore.warehouse.dir"* set to *"/tmp/m1"*: > {code:java} > sudo -u testuser1 hive --hiveconf hive.metastore.uris= --hiveconf > hive.metastore.warehouse.dir=/tmp/m1 > {code} > 5. Perform the next steps: > {code:sql}-- 1. Check "hive.metastore.warehouse.dir" value: > SET hive.metastore.warehouse.dir; > -- 2. Set "hive.metastore.warehouse.dir" to the path, to which "testuser1" > user does not have an access: > SET hive.metastore.warehouse.dir=/tmp/m2; > -- 3. Try to create a database: > CREATE DATABASE m2; > -- 4. Set "hive.metastore.warehouse.dir" to the path, to which "testuser1" > user has an access: > SET hive.metastore.warehouse.dir=/tmp/m3; > -- 5. Try to create a database: > CREATE DATABASE m3; > {code} > *ACTUAL RESULT:* > Query 5 fails with an exception below. It does not handle > "hive.metastore.warehouse.dir" proprty: > {code:java} > hive> -- 5. Try to create a database: > hive> CREATE DATABASE m3; > FAILED: HiveException org.apache.hadoop.security.AccessControlException: User > testuser1(user id 5001) does not have access to hdfs:/tmp/m2/m3.db > hive> > {code} > *EXPECTED RESULT:* > Query 5 creates a database; -- This message was sent by Atlassian Jira (v8.3.4#803005)