[jira] [Commented] (HIVE-21718) Improvement performance of UpdateInputAccessTimeHook
[ https://issues.apache.org/jira/browse/HIVE-21718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845002#comment-16845002 ] Aihua Xu commented on HIVE-21718: - [~ngangam] Sorry for the late reply. I will take a look. > Improvement performance of UpdateInputAccessTimeHook > > > Key: HIVE-21718 > URL: https://issues.apache.org/jira/browse/HIVE-21718 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 2.1.1 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Attachments: HIVE-21718.2.patch, HIVE-21718.patch > > > Currently, Hive does not update the lastAccessTime property for any entities > when a query accesses them. Thus it has not possible to know when a table was > last accessed. > Hive does provide a configurable hook to HS2 that is execcuted as a pre-query > hook prior to the query being executed. However, this hook is inefficient > because for each table or partition it is attempting to update time for, it > executes an "alter table ... " command internally. This is bad > 1) For a query touching 1000's of partitions, this hook takes forever to > update them. > 2) Meanwhile, it is holding up the original query from executing. > So even though we do not recommend using the hook, because the reward is too > little (having lastAccessTime updated), we realize there is no other means to > achieve this. > Also, we can improve the performance of the hook significantly by adding a > new thrift API on HMS to update the lastAccessTime on the database rows > directly instead of going to HMS front end for 1 entity at time (leading to > 1000's of HMS calls that lead to multiple 1000's of calls to the database). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21718) Improvement performance of UpdateInputAccessTimeHook
[ https://issues.apache.org/jira/browse/HIVE-21718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839822#comment-16839822 ] Naveen Gangam commented on HIVE-21718: -- Review posted to RB at https://reviews.apache.org/r/70645/ > Improvement performance of UpdateInputAccessTimeHook > > > Key: HIVE-21718 > URL: https://issues.apache.org/jira/browse/HIVE-21718 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 2.1.1 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Attachments: HIVE-21718.2.patch, HIVE-21718.patch > > > Currently, Hive does not update the lastAccessTime property for any entities > when a query accesses them. Thus it has not possible to know when a table was > last accessed. > Hive does provide a configurable hook to HS2 that is execcuted as a pre-query > hook prior to the query being executed. However, this hook is inefficient > because for each table or partition it is attempting to update time for, it > executes an "alter table ... " command internally. This is bad > 1) For a query touching 1000's of partitions, this hook takes forever to > update them. > 2) Meanwhile, it is holding up the original query from executing. > So even though we do not recommend using the hook, because the reward is too > little (having lastAccessTime updated), we realize there is no other means to > achieve this. > Also, we can improve the performance of the hook significantly by adding a > new thrift API on HMS to update the lastAccessTime on the database rows > directly instead of going to HMS front end for 1 entity at time (leading to > 1000's of HMS calls that lead to multiple 1000's of calls to the database). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21718) Improvement performance of UpdateInputAccessTimeHook
[ https://issues.apache.org/jira/browse/HIVE-21718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839819#comment-16839819 ] Naveen Gangam commented on HIVE-21718: -- [~aihuaxu] [~ychena] [~daijy] Could you please review this ? Thanks > Improvement performance of UpdateInputAccessTimeHook > > > Key: HIVE-21718 > URL: https://issues.apache.org/jira/browse/HIVE-21718 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 2.1.1 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Attachments: HIVE-21718.2.patch, HIVE-21718.patch > > > Currently, Hive does not update the lastAccessTime property for any entities > when a query accesses them. Thus it has not possible to know when a table was > last accessed. > Hive does provide a configurable hook to HS2 that is execcuted as a pre-query > hook prior to the query being executed. However, this hook is inefficient > because for each table or partition it is attempting to update time for, it > executes an "alter table ... " command internally. This is bad > 1) For a query touching 1000's of partitions, this hook takes forever to > update them. > 2) Meanwhile, it is holding up the original query from executing. > So even though we do not recommend using the hook, because the reward is too > little (having lastAccessTime updated), we realize there is no other means to > achieve this. > Also, we can improve the performance of the hook significantly by adding a > new thrift API on HMS to update the lastAccessTime on the database rows > directly instead of going to HMS front end for 1 entity at time (leading to > 1000's of HMS calls that lead to multiple 1000's of calls to the database). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21718) Improvement performance of UpdateInputAccessTimeHook
[ https://issues.apache.org/jira/browse/HIVE-21718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839803#comment-16839803 ] Hive QA commented on HIVE-21718: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12968701/HIVE-21718.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 16008 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17211/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17211/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17211/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12968701 - PreCommit-HIVE-Build > Improvement performance of UpdateInputAccessTimeHook > > > Key: HIVE-21718 > URL: https://issues.apache.org/jira/browse/HIVE-21718 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 2.1.1 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Attachments: HIVE-21718.2.patch, HIVE-21718.patch > > > Currently, Hive does not update the lastAccessTime property for any entities > when a query accesses them. Thus it has not possible to know when a table was > last accessed. > Hive does provide a configurable hook to HS2 that is execcuted as a pre-query > hook prior to the query being executed. However, this hook is inefficient > because for each table or partition it is attempting to update time for, it > executes an "alter table ... " command internally. This is bad > 1) For a query touching 1000's of partitions, this hook takes forever to > update them. > 2) Meanwhile, it is holding up the original query from executing. > So even though we do not recommend using the hook, because the reward is too > little (having lastAccessTime updated), we realize there is no other means to > achieve this. > Also, we can improve the performance of the hook significantly by adding a > new thrift API on HMS to update the lastAccessTime on the database rows > directly instead of going to HMS front end for 1 entity at time (leading to > 1000's of HMS calls that lead to multiple 1000's of calls to the database). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21718) Improvement performance of UpdateInputAccessTimeHook
[ https://issues.apache.org/jira/browse/HIVE-21718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839781#comment-16839781 ] Hive QA commented on HIVE-21718: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 48s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 56s{color} | {color:blue} standalone-metastore/metastore-common in master has 29 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 16s{color} | {color:blue} standalone-metastore/metastore-server in master has 181 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 30s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 46s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s{color} | {color:red} standalone-metastore/metastore-common: The patch generated 1 new + 388 unchanged - 0 fixed = 389 total (was 388) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} standalone-metastore/metastore-server: The patch generated 5 new + 1638 unchanged - 0 fixed = 1643 total (was 1638) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 49s{color} | {color:red} ql: The patch generated 1 new + 208 unchanged - 0 fixed = 209 total (was 208) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 26s{color} | {color:red} standalone-metastore/metastore-server generated 3 new + 181 unchanged - 0 fixed = 184 total (was 181) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 5s{color} | {color:red} standalone-metastore_metastore-common generated 2 new + 45 unchanged - 0 fixed = 47 total (was 45) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:standalone-metastore/metastore-server | | | org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updateLastAccessTime(Map, int) concatenates strings using + in a loop At MetaStoreDirectSql.java:+ in a loop At MetaStoreDirectSql.java:[line 545] | | | org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updateLastAccessTime(Map, int) passes a nonconstant String to an execute or addBatch method on an SQL statement At MetaStoreDirectSql.java:String to an execute or addBatch method on an SQL statement At MetaStoreDirectSql.java:[line 558] | | | org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updateLastAccessTime(Map, int) makes inefficient use of keySet iterator instead of entrySet iterator At MetaStoreDirectSql.java:of keySet iterator instead of entrySet iterator At MetaStoreDirectSql.java:[line 536] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP
[jira] [Commented] (HIVE-21718) Improvement performance of UpdateInputAccessTimeHook
[ https://issues.apache.org/jira/browse/HIVE-21718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839634#comment-16839634 ] Hive QA commented on HIVE-21718: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12968685/HIVE-21718.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17209/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17209/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17209/ Messages: {noformat} This message was trimmed, see log for full details [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/slf4j/jul-to-slf4j/1.7.10/jul-to-slf4j-1.7.10.jar(org/slf4j/bridge/SLF4JBridgeHandler.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/DispatcherType.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/Filter.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/FilterChain.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/FilterConfig.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/ServletException.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/ServletRequest.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/ServletResponse.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/annotation/WebFilter.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/http/HttpServletRequest.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-runner/9.3.25.v20180904/jetty-runner-9.3.25.v20180904.jar(javax/servlet/http/HttpServletResponse.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/classification/target/hive-classification-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceAudience$LimitedPrivate.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/classification/target/hive-classification-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceStability$Unstable.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/ByteArrayOutputStream.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/OutputStream.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Closeable.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/AutoCloseable.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Flushable.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(javax/xml/bind/annotation/XmlRootElement.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/commons/commons-exec/1.1/commons-exec-1.1.jar(org/apache/commons/exec/ExecuteException.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/security/PrivilegedExceptionAction.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/util/concurrent/ExecutionException.class)]] [loading ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/util/concurrent/TimeoutException.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/3.1.0/hadoop-common-3.1.0.jar(org/apache/hadoop/fs/FileSystem.class)]] [loading ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/shims/common/target/hive-shims-common-4.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/shims/HadoopShimsSecure.class)]] [loading