[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow
[ https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696862#comment-14696862 ] Hive QA commented on HIVE-11502: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750428/HIVE-11502.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9357 tests executed *Failed tests:* {noformat} TestDummy - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4962/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4962/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4962/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750428 - PreCommit-HIVE-TRUNK-Build Map side aggregation is extremely slow -- Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11502.1.patch, HIVE-11502.2.patch For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10276) Implement date_format(timestamp, fmt) UDF
[ https://issues.apache.org/jira/browse/HIVE-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696883#comment-14696883 ] Amareshwari Sriramadasu commented on HIVE-10276: The implementation done here does not look SQL compliant. For ex: D is day of year in SimpleDateFormat (https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html), but is month of the year on SQL : https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date-format Implement date_format(timestamp, fmt) UDF - Key: HIVE-10276 URL: https://issues.apache.org/jira/browse/HIVE-10276 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Fix For: 1.2.0 Attachments: HIVE-10276.01.patch date_format(date/timestamp/string, fmt) converts a date/timestamp/string to a value of String in the format specified by the java date format fmt. Supported formats listed here: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11536) %TYPE and %ROWTYPE attributes in data type declaration
[ https://issues.apache.org/jira/browse/HIVE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696887#comment-14696887 ] Dmitry Tolpeko commented on HIVE-11536: --- An example: {code} DECLARE v src.key%TYPE; BEGIN SELECT key INTO v FROM src LIMIT 1; PRINT v; END {code} %TYPE and %ROWTYPE attributes in data type declaration -- Key: HIVE-11536 URL: https://issues.apache.org/jira/browse/HIVE-11536 Project: Hive Issue Type: Improvement Components: hpl/sql Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko %TYPE and %ROWTYPE attributes allow you to derive the data type from the corresponding table column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow
[ https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696920#comment-14696920 ] Yongzhi Chen commented on HIVE-11502: - The TestDummy failure is not related: It failed because of FileNotFoundException: [exec] + javac -cp /home/hiveptest/54.147.251.176-hiveptest-2/maven/org/apache/hive/hive-exec/2.0.0-SNAPSHOT/hive-exec-2.0.0-SNAPSHOT.jar /tmp/UDFExampleAdd.java -d /tmp [exec] + jar -cf /tmp/udfexampleadd-1.0.jar -C /tmp UDFExampleAdd.class [exec] java.io.FileNotFoundException: /tmp/UDFExampleAdd.class (No such file or directory) [exec] at java.io.FileInputStream.open(Native Method) [exec] at java.io.FileInputStream.init(FileInputStream.java:146) [exec] at sun.tools.jar.Main.copy(Main.java:791) [exec] at sun.tools.jar.Main.addFile(Main.java:740) [exec] at sun.tools.jar.Main.create(Main.java:491) [exec] at sun.tools.jar.Main.run(Main.java:201) [exec] at sun.tools.jar.Main.main(Main.java:1177) Map side aggregation is extremely slow -- Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11502.1.patch, HIVE-11502.2.patch For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40) and it is in type double, the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10435) Make HiveSession implementation pluggable through configuration
[ https://issues.apache.org/jira/browse/HIVE-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-10435: -- Labels: TODOC2.0 (was: ) Make HiveSession implementation pluggable through configuration --- Key: HIVE-10435 URL: https://issues.apache.org/jira/browse/HIVE-10435 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Akshay Goyal Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-10435.1.patch, HIVE-10435.2.patch SessionManager in CLIService creates and keeps track of HiveSession. Right now, it creates HiveSessionImpl which is one implementation of HiveSession. This improvement request is to make it pluggable through a configuration sothat other implementations can be passed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x
[ https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696536#comment-14696536 ] Hive QA commented on HIVE-11304: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750362/HIVE-11304.9.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9361 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4959/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4959/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4959/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750362 - PreCommit-HIVE-TRUNK-Build Migrate to Log4j2 from Log4j 1.x Key: HIVE-11304 URL: https://issues.apache.org/jira/browse/HIVE-11304 Project: Hive Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11304.10.patch, HIVE-11304.2.patch, HIVE-11304.3.patch, HIVE-11304.4.patch, HIVE-11304.5.patch, HIVE-11304.6.patch, HIVE-11304.7.patch, HIVE-11304.8.patch, HIVE-11304.9.patch, HIVE-11304.patch Log4J2 has some great benefits and can benefit hive significantly. Some notable features include 1) Performance (parametrized logging, performance when logging is disabled etc.) More details can be found here https://logging.apache.org/log4j/2.x/performance.html 2) RoutingAppender - Route logs to different log files based on MDC context (useful for HS2, LLAP etc.) 3) Asynchronous logging This is an umbrella jira to track changes related to Log4j2 migration. Log4J1 EOL - https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3983) Select on table with hbase storage handler fails with an SASL error
[ https://issues.apache.org/jira/browse/HIVE-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696575#comment-14696575 ] Arup Malakar commented on HIVE-3983: I don't have the setup to test this, will reopen if I see it again. Thanks! Select on table with hbase storage handler fails with an SASL error --- Key: HIVE-3983 URL: https://issues.apache.org/jira/browse/HIVE-3983 Project: Hive Issue Type: Bug Components: HBase Handler, Security Environment: hive-0.10 hbase-0.94.5.5 hadoop-0.23.3.1 hcatalog-0.5 Reporter: Arup Malakar Assignee: Swarnim Kulkarni The table is created using the following query: {code} CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); {code} Doing a select on the table launches a map-reduce job. But the job fails with the following error: {code} 2013-02-02 01:31:07,500 FATAL [IPC Server handler 3 on 40118] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1348093718159_1501_m_00_0 - exited : java.io.IOException: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:160) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:381) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$1.run(SecureClient.java:242) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37) at org.apache.hadoop.hbase.security.User.call(User.java:590) at org.apache.hadoop.hbase.security.User.access$700(User.java:51) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:444) at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.handleSaslConnectionFailure(SecureClient.java:203) at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:291) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:104) at $Proxy12.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.SecureRpcEngine.getProxy(SecureRpcEngine.java:146) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1278) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987) at
[jira] [Commented] (HIVE-10435) Make HiveSession implementation pluggable through configuration
[ https://issues.apache.org/jira/browse/HIVE-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696527#comment-14696527 ] Lefty Leverenz commented on HIVE-10435: --- Doc note: This adds two configuration parameters (*hive.session.impl.classname* and *hive.session.impl.withugi.classname*) to HiveConf.java, so they need to be documented in the wiki. Added a TODOC2.0 label. * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] Make HiveSession implementation pluggable through configuration --- Key: HIVE-10435 URL: https://issues.apache.org/jira/browse/HIVE-10435 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Akshay Goyal Labels: TODOC2.0 Fix For: 2.0.0 Attachments: HIVE-10435.1.patch, HIVE-10435.2.patch SessionManager in CLIService creates and keeps track of HiveSession. Right now, it creates HiveSessionImpl which is one implementation of HiveSession. This improvement request is to make it pluggable through a configuration sothat other implementations can be passed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11548) HCatLoader should support predicate pushdown.
[ https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696656#comment-14696656 ] Hive QA commented on HIVE-11548: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750373/HIVE-11548.1.patch {color:red}ERROR:{color} -1 due to 133 failed/errored test(s), 9356 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.mapreduce.TestHCatHiveCompatibility.testPartedRead org.apache.hive.hcatalog.mapreduce.TestHCatHiveCompatibility.testUnpartedReadWrite org.apache.hive.hcatalog.mapreduce.TestHCatHiveThriftCompatibility.testDynamicCols org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteRead org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteReadMR org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteRead org.apache.hive.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteReadMR org.apache.hive.hcatalog.pig.TestE2EScenarios.testReadOrcAndRCFromPig org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[0] org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[1] org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[2] org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[3] org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[4] org.apache.hive.hcatalog.pig.TestHCatLoader.testConvertBooleanToInt[5] org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[0] org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[1] org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[2] org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[3] org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic[5] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[0] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[1] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[2] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[3] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataBasic[5] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[0] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[1] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[2] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[3] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[4] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes[5] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[0] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[1] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[2] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[3] org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic[5] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[0] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[1] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[2] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[3] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[0] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[1] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[2] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[3] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[5] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[0] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[1] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[2] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[3] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testSyntheticComplexSchema[5] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[0] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[1] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[2] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[3] org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[5] org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.testReadDataFromEncryptedHiveTableByPig[0] org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.testReadDataFromEncryptedHiveTableByPig[1]
[jira] [Commented] (HIVE-11556) HiveFilter.copy should take the condition given as a parameter
[ https://issues.apache.org/jira/browse/HIVE-11556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697073#comment-14697073 ] Ashutosh Chauhan commented on HIVE-11556: - +1 pending tests HiveFilter.copy should take the condition given as a parameter -- Key: HIVE-11556 URL: https://issues.apache.org/jira/browse/HIVE-11556 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11556.patch Currently the condition is taken from the original Filter. However, a new condition is given as an input parameter; the new Filter should use that condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697016#comment-14697016 ] Ashutosh Chauhan commented on HIVE-10631: - Can you create a review board for this ? create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills
[ https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696671#comment-14696671 ] Gopal V commented on HIVE-11306: The patch .3 does not give performance boost observed in patch .2. The crucial difference is that patch .3 does not really consider the bloom filter to be valid for spilled partitions. {code} + if (!bloom1.testLong(keyHash) !isOnDisk(partitionId)) { {code} the isOnDisk check negates all the performance benefits of checking the bloom filter to avoid spilling. Add a bloom-1 filter for Hybrid MapJoin spills -- Key: HIVE-11306 URL: https://issues.apache.org/jira/browse/HIVE-11306 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, HIVE-11306.3.patch HIVE-9277 implemented Spillable joins for Tez, which suffers from a corner-case performance issue when joining wide small tables against a narrow big table (like a user info table join events stream). The fact that the wide table is spilled causes extra IO, even though the nDV of the join key might be in the thousands. A cheap bloom-1 filter would add a massive performance gain for such queries, massively cutting down on the spill IO costs for the big-table spills. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11472) ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per row
[ https://issues.apache.org/jira/browse/HIVE-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11472: --- Attachment: HIVE-11472.2.patch ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per row --- Key: HIVE-11472 URL: https://issues.apache.org/jira/browse/HIVE-11472 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: Performance Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11472.1.patch, HIVE-11472.2.patch For every row x column {code} int len = (int) lengths.next(); int offset = 0; byte[] bytes = new byte[len]; while (len 0) { int written = stream.read(bytes, offset, len); if (written 0) { throw new EOFException(Can't finish byte read from + stream); } {code} https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/TreeReaderFactory.java#L1552 This is not a big issue until it misses the GC TLAB. From hadoop-2.6.x (HADOOP-10855) you can read into a Text directly. Possibly can create a different TreeReader from the factory for 2.6.x use a DataInputStream per stream and prevent an allocation in the inner loop. {code} int len = (int) lengths.next(); result.readWithKnownLength(datastream, len); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout
[ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696754#comment-14696754 ] Hive QA commented on HIVE-11317: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750421/HIVE-11317.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9360 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testExceptions {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4961/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4961/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4961/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750421 - PreCommit-HIVE-TRUNK-Build ACID: Improve transaction Abort logic due to timeout Key: HIVE-11317 URL: https://issues.apache.org/jira/browse/HIVE-11317 Project: Hive Issue Type: Bug Components: Metastore, Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Labels: triage Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.patch the logic to Abort transactions that have stopped heartbeating is in TxnHandler.timeOutTxns() This is only called when DbTxnManger.getValidTxns() is called. So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system. Also, streaming api doesn't call DbTxnManager at all. Need to move this logic into Initiator (or some other metastore side thread). Also, make sure it is broken up into multiple small(er) transactions against metastore DB. Also more timeOutLocks() locks there as well. see about adding TXNS.COMMENT field which can be used for Auto aborted due to timeout for example. The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10602) optimize PTF for GC
[ https://issues.apache.org/jira/browse/HIVE-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696708#comment-14696708 ] Takanobu Asanuma commented on HIVE-10602: - Hi, [~sershe]. I'd like to try this jira. Could you assign it to me? Thanks. optimize PTF for GC --- Key: HIVE-10602 URL: https://issues.apache.org/jira/browse/HIVE-10602 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin see HIVE-10600 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11329) Column prefix in key of hbase column prefix map
[ https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wojciech Indyk updated HIVE-11329: -- Issue Type: Improvement (was: Bug) Column prefix in key of hbase column prefix map --- Key: HIVE-11329 URL: https://issues.apache.org/jira/browse/HIVE-11329 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Wojciech Indyk Assignee: Wojciech Indyk Priority: Minor Attachments: HIVE-11329.1.patch, HIVE-11329.2.patch When I create a table with hbase column prefix https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result map in hive. E.g. record in HBase rowkey: 123 column: tag_one, value: 0.5 column: tag_two, value 0.5 representation in Hive via column prefix mapping tag_.*: column: tag mapstring,string key: tag_one, value: 0.5 key: tag_two, value: 0.5 should be: key: one, value: 0.5 key: two: value: 0.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11472) ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per row
[ https://issues.apache.org/jira/browse/HIVE-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697079#comment-14697079 ] Hive QA commented on HIVE-11472: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750476/HIVE-11472.2.patch {color:green}SUCCESS:{color} +1 9358 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4965/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4965/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4965/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12750476 - PreCommit-HIVE-TRUNK-Build ORC StringDirectTreeReader is thrashing the GC due to byte[] allocation per row --- Key: HIVE-11472 URL: https://issues.apache.org/jira/browse/HIVE-11472 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: Performance Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11472.1.patch, HIVE-11472.2.patch For every row x column {code} int len = (int) lengths.next(); int offset = 0; byte[] bytes = new byte[len]; while (len 0) { int written = stream.read(bytes, offset, len); if (written 0) { throw new EOFException(Can't finish byte read from + stream); } {code} https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/TreeReaderFactory.java#L1552 This is not a big issue until it misses the GC TLAB. From hadoop-2.6.x (HADOOP-10855) you can read into a Text directly. Possibly can create a different TreeReader from the factory for 2.6.x use a DataInputStream per stream and prevent an allocation in the inner loop. {code} int len = (int) lengths.next(); result.readWithKnownLength(datastream, len); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10602) optimize PTF for GC
[ https://issues.apache.org/jira/browse/HIVE-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10602: Assignee: Takanobu Asanuma optimize PTF for GC --- Key: HIVE-10602 URL: https://issues.apache.org/jira/browse/HIVE-10602 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Takanobu Asanuma see HIVE-10600 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11525) Bucket pruning
[ https://issues.apache.org/jira/browse/HIVE-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697580#comment-14697580 ] Sergey Shelukhin commented on HIVE-11525: - Sure! Thanks! Bucket pruning -- Key: HIVE-11525 URL: https://issues.apache.org/jira/browse/HIVE-11525 Project: Hive Issue Type: Improvement Components: Logical Optimizer Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.0 Reporter: Maciek Kocon Assignee: Takuya Fukudome Labels: gsoc2015 Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or [bucket] MAP JOIN are possible. The difference seems to be imposed by design where the PARTITIONing is open/explicit while BUCKETing is discrete/implicit. Partitioning seems to be very common if not a standard feature in all current RDBMS while BUCKETING seems to be HIVE specific only. In a way BUCKETING could be also called by hashing or simply IMPLICIT PARTITIONING. Regardless of the fact that these two are recognised as two separate features available in Hive there should be nothing to prevent leveraging same existing query/join optimisations across the two. BUCKET pruning Enable partition PRUNING equivalent optimisation for queries on BUCKETED tables Simplest example is for queries like: SELECT … FROM x WHERE colA=123123 to read only the relevant bucket file rather than all file-buckets that belong to a table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10602) optimize PTF for GC
[ https://issues.apache.org/jira/browse/HIVE-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697579#comment-14697579 ] Sergey Shelukhin commented on HIVE-10602: - Sure! Thanks for working on this. optimize PTF for GC --- Key: HIVE-10602 URL: https://issues.apache.org/jira/browse/HIVE-10602 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin see HIVE-10600 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697589#comment-14697589 ] Aaron Tokhy commented on HIVE-10631: Sorry, fixed it just now. create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11272) LLAP: Execution order within LLAP daemons should consider query-specific priority assigned to fragments
[ https://issues.apache.org/jira/browse/HIVE-11272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved HIVE-11272. --- Resolution: Fixed LLAP: Execution order within LLAP daemons should consider query-specific priority assigned to fragments --- Key: HIVE-11272 URL: https://issues.apache.org/jira/browse/HIVE-11272 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical Fix For: llap Attachments: HIVE-11272.1.txt, HIVE-11272.2.txt It's currently looking at finishable state, start time and vertex parallelism. Vertex parallelism can be replaced by upstream parallelism as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10600) optimize group by for GC
[ https://issues.apache.org/jira/browse/HIVE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10600: Assignee: Matt McCline optimize group by for GC Key: HIVE-10600 URL: https://issues.apache.org/jira/browse/HIVE-10600 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Matt McCline Quoting [~gopalv]: {noformat} So, something like a sum() GROUP BY will create a few hundred thousand AbstractAggregationBuffer objects all of which will suddenly go out of scope when the map.aggr flushes it down to the sort buffer. That particular GC collection takes forever because the tiny buffers take a lot of time to walk over and then they leave the memory space fragmented, which requires a compaction pass (which btw, writes to a page-interleaved NUMA zone). And to make things worse, the pre-allocated sort buffers with absolutely zero data in them take up most of the tenured regions causing these chunks of memory to be visited more and more often as they are part of the Eden space. {noformat} We need flat data structures to be GC friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697371#comment-14697371 ] Aaron Tokhy commented on HIVE-10631: Sure, here it is: https://reviews.apache.org/r/37484/ create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697368#comment-14697368 ] Alan Gates commented on HIVE-11500: --- bq. I think we should use YAGNI principle... That's fine, but it means on the next interface you want to add you have to convince me that you should add another set of calls rather than refactor this one to be generic. bq. Having many methods on metastore is not really that big of a deal, since they do different things. I disagree. Having just implemented a new version of RawStore I can tell you that it took me a long time to understand the nuances of why there's five different ways to fetch partitions. I'm still not sure I have it all straight. We should not just add a new call each time because it's the shortest path to get the new thing working. We need to think about code maintenance and understandability for future developers. implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBase metastore split cache.pdf We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11547) beeline does not continue running the script after an error occurs while beeline --force=true is already set.
[ https://issues.apache.org/jira/browse/HIVE-11547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697535#comment-14697535 ] Yongzhi Chen commented on HIVE-11547: - HIVE-11203 has fixed the issue. beeline does not continue running the script after an error occurs while beeline --force=true is already set. --- Key: HIVE-11547 URL: https://issues.apache.org/jira/browse/HIVE-11547 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0 Environment: HDP 2.3 on Virtual box Reporter: Wei Huang If you execute beeline to run a SQL script file, using the following command beeline -f query file name the beeline exists after the first error. i.e. when a test query fails beeline quits to the CLI. The beeline --force=true seems to have a bug and it does not continue running the script after an error occurs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11565) LLAP: Tez counters for LLAP
[ https://issues.apache.org/jira/browse/HIVE-11565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697674#comment-14697674 ] Sergey Shelukhin commented on HIVE-11565: - [~gopalv] [~sseth] fyi LLAP: Tez counters for LLAP --- Key: HIVE-11565 URL: https://issues.apache.org/jira/browse/HIVE-11565 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin 1) Tez counters for LLAP are incorrect. 2) Some counters, such as cache hit ratio for a fragment, are not propagated. We need to make sure that Tez counters for LLAP are usable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9756) LLAP: use log4j 2 for llap
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9756: --- Assignee: Prasanth Jayachandran (was: Gopal V) LLAP: use log4j 2 for llap -- Key: HIVE-9756 URL: https://issues.apache.org/jira/browse/HIVE-9756 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Prasanth Jayachandran For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get throughput friendly logging. http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697664#comment-14697664 ] Sergey Shelukhin edited comment on HIVE-11500 at 8/14/15 8:09 PM: -- Actually the main reason all these calls exist for partitions is because they use args instead of request-response pattern, which makes it impossible to change the signature in a backward-compatible manner. I will happily refactor the newly added calls to be generic (req/resp should allow for that), or deprecate them in favor of generic calls and remove later, if the need arises. was (Author: sershe): Actually the main reason all these calls exist for partitions is because they use args instead of request-response pattern, which makes it impossible to change the signature in a backward-compatible manner. I will happily refactor these calls to be generic, or deprecate them in favor of generic calls and remove later, if the need arises. implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBase metastore split cache.pdf We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11535) LLAP: move EncodedTreeReaderFactory, TreeReaderFactory bits that rely on orc.encoded, and StreamUtils if needed, to orc.encoded package
[ https://issues.apache.org/jira/browse/HIVE-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697686#comment-14697686 ] Prasanth Jayachandran commented on HIVE-11535: -- LGTM, +1 LLAP: move EncodedTreeReaderFactory, TreeReaderFactory bits that rely on orc.encoded, and StreamUtils if needed, to orc.encoded package --- Key: HIVE-11535 URL: https://issues.apache.org/jira/browse/HIVE-11535 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11535.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11566) Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens
[ https://issues.apache.org/jira/browse/HIVE-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11566: - Description: Currently it's allocating one write buffer for a number of hash partitions up front, which can cause GC pause. It's better to do the write buffer allocation on demand. was: Currently it's allocating one write buffer for a number of hash partitions up front, which causes GC pause. It's better to do the write buffer allocation on demand. Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens -- Key: HIVE-11566 URL: https://issues.apache.org/jira/browse/HIVE-11566 Project: Hive Issue Type: Bug Components: Hive Reporter: Wei Zheng Assignee: Wei Zheng Currently it's allocating one write buffer for a number of hash partitions up front, which can cause GC pause. It's better to do the write buffer allocation on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11566) Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens
[ https://issues.apache.org/jira/browse/HIVE-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11566: - Affects Version/s: 1.2.0 Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens -- Key: HIVE-11566 URL: https://issues.apache.org/jira/browse/HIVE-11566 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Currently it's allocating one write buffer for a number of hash partitions up front, which can cause GC pause. It's better to do the write buffer allocation on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11524) LLAP: tez.runtime.compress doesn't appear to be honored for LLAP
[ https://issues.apache.org/jira/browse/HIVE-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11524: Fix Version/s: llap LLAP: tez.runtime.compress doesn't appear to be honored for LLAP Key: HIVE-11524 URL: https://issues.apache.org/jira/browse/HIVE-11524 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth Fix For: llap When running llap on an openstack cluster without snappy installed, with tez.runtime.compress set to false and codec set to snappy, one still gets the exceptions due to snappy codec being absent: {noformat} 2015-08-10 11:14:30,440 [TezTaskRunner_attempt_1438943112941_0015_2_00_00_0(attempt_1438943112941_0015_2_00_00_0)] ERROR org.apache.hadoop.io.compress.snappy.SnappyCompressor: failed to load SnappyCompressor java.lang.NoSuchFieldError: clazz at org.apache.hadoop.io.compress.snappy.SnappyCompressor.initIDs(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyCompressor.clinit(SnappyCompressor.java:57) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:69) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165) at org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.init(IFile.java:153) at org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.init(IFile.java:138) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$SpillCallable.callInternal(UnorderedPartitionedKVWriter.java:406) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$SpillCallable.callInternal(UnorderedPartitionedKVWriter.java:367) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.finalSpill(UnorderedPartitionedKVWriter.java:612) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.close(UnorderedPartitionedKVWriter.java:521) at org.apache.tez.runtime.library.output.UnorderedKVOutput.close(UnorderedKVOutput.java:128) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:376) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:79) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} When it's set to true, the client complains about snappy. When it's set to fails, the client doesn't complain but it still tries to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11524) LLAP: tez.runtime.compress doesn't appear to be honored for LLAP
[ https://issues.apache.org/jira/browse/HIVE-11524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-11524. - Resolution: Cannot Reproduce user error/misunderstanding LLAP: tez.runtime.compress doesn't appear to be honored for LLAP Key: HIVE-11524 URL: https://issues.apache.org/jira/browse/HIVE-11524 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth When running llap on an openstack cluster without snappy installed, with tez.runtime.compress set to false and codec set to snappy, one still gets the exceptions due to snappy codec being absent: {noformat} 2015-08-10 11:14:30,440 [TezTaskRunner_attempt_1438943112941_0015_2_00_00_0(attempt_1438943112941_0015_2_00_00_0)] ERROR org.apache.hadoop.io.compress.snappy.SnappyCompressor: failed to load SnappyCompressor java.lang.NoSuchFieldError: clazz at org.apache.hadoop.io.compress.snappy.SnappyCompressor.initIDs(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyCompressor.clinit(SnappyCompressor.java:57) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:69) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165) at org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.init(IFile.java:153) at org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.init(IFile.java:138) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$SpillCallable.callInternal(UnorderedPartitionedKVWriter.java:406) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$SpillCallable.callInternal(UnorderedPartitionedKVWriter.java:367) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.finalSpill(UnorderedPartitionedKVWriter.java:612) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.close(UnorderedPartitionedKVWriter.java:521) at org.apache.tez.runtime.library.output.UnorderedKVOutput.close(UnorderedKVOutput.java:128) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:376) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:79) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} When it's set to true, the client complains about snappy. When it's set to fails, the client doesn't complain but it still tries to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11525) Bucket pruning
[ https://issues.apache.org/jira/browse/HIVE-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11525: Assignee: Takuya Fukudome Bucket pruning -- Key: HIVE-11525 URL: https://issues.apache.org/jira/browse/HIVE-11525 Project: Hive Issue Type: Improvement Components: Logical Optimizer Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.0 Reporter: Maciek Kocon Assignee: Takuya Fukudome Labels: gsoc2015 Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or [bucket] MAP JOIN are possible. The difference seems to be imposed by design where the PARTITIONing is open/explicit while BUCKETing is discrete/implicit. Partitioning seems to be very common if not a standard feature in all current RDBMS while BUCKETING seems to be HIVE specific only. In a way BUCKETING could be also called by hashing or simply IMPLICIT PARTITIONING. Regardless of the fact that these two are recognised as two separate features available in Hive there should be nothing to prevent leveraging same existing query/join optimisations across the two. BUCKET pruning Enable partition PRUNING equivalent optimisation for queries on BUCKETED tables Simplest example is for queries like: SELECT … FROM x WHERE colA=123123 to read only the relevant bucket file rather than all file-buckets that belong to a table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10144) [LLAP] merge brought in file blocking github sync
[ https://issues.apache.org/jira/browse/HIVE-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697643#comment-14697643 ] Sergey Shelukhin commented on HIVE-10144: - [~hagleitn] [~gopalv] [~vikram.dixit] [~sseth] [~prasanth_j] maybe now is the good time to destroy the history of the llap branch? :) We can rebase to exclude the large file from history, and also rename all the commits that are not attached to JIRAs. [LLAP] merge brought in file blocking github sync - Key: HIVE-10144 URL: https://issues.apache.org/jira/browse/HIVE-10144 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Szehon Ho Assignee: Gunther Hagleitner r1669718 brought in a file that is not in source control on llap branch: [http://svn.apache.org/repos/asf/hive/branches/llap/itests/thirdparty/|http://svn.apache.org/repos/asf/hive/branches/llap/itests/thirdparty/] It is a file downloaded during test build and should not be in source control. It is actually blocking the github sync as its too large. See INFRA-9360 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697664#comment-14697664 ] Sergey Shelukhin commented on HIVE-11500: - Actually the main reason all these calls exist for partitions is because they use args instead of request-response pattern, which makes it impossible to change the signature in a backward-compatible manner. I will happily refactor these calls to be generic, or deprecate them in favor of generic calls and remove later, if the need arises. implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBase metastore split cache.pdf We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11500) implement file footer / splits cache in HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697666#comment-14697666 ] Sergey Shelukhin commented on HIVE-11500: - Btw, you should review the API patch in HIVE-11552 ;) implement file footer / splits cache in HBase metastore --- Key: HIVE-11500 URL: https://issues.apache.org/jira/browse/HIVE-11500 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBase metastore split cache.pdf We need to cache file metadata (e.g. ORC file footers) for split generation (which, on FSes that support fileId, will be valid permanently and only needs to be removed lazily when ORC file is erased or compacted), and potentially even some information about splits (e.g. grouping based on location that would be good for some short time), in HBase metastore. -It should be queryable by table. Partition predicate pushdown should be supported. If bucket pruning is added, that too.- Given that we cannot cache file lists (we have to check FS for new/changed files anyway), and the difficulty of passing of data about partitions/etc. to split generation compared to paths, we will probably just filter by paths and fileIds. It might be different for splits In later phases, it would be nice to save the (first category above) results of expensive work done by jobs, e.g. data size after decompression/decoding per column, etc. to avoid surprises when ORC encoding is very good, or very bad. Perhaps it can even be lazily generated. Here's a pony: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11566) Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens
[ https://issues.apache.org/jira/browse/HIVE-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-11566: - Description: Currently it's allocating one write buffer for a number of hash partitions up front, which causes GC pause. It's better to do the write buffer allocation on demand. was: Currently it's allocating a write buffer for a fixed number of hash partitions up front, which causes GC pause. It's better to do the write buffer allocation on demand. Hybrid grace hash join should only allocate write buffer for a hash partition when first write happens -- Key: HIVE-11566 URL: https://issues.apache.org/jira/browse/HIVE-11566 Project: Hive Issue Type: Bug Components: Hive Reporter: Wei Zheng Assignee: Wei Zheng Currently it's allocating one write buffer for a number of hash partitions up front, which causes GC pause. It's better to do the write buffer allocation on demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8898) Remove HIVE-8874 once HBASE-12493 is fixed
[ https://issues.apache.org/jira/browse/HIVE-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697428#comment-14697428 ] Swarnim Kulkarni commented on HIVE-8898: I logged a JIRA here[1] to revert the work done. [1] https://issues.apache.org/jira/browse/HIVE-11559 Remove HIVE-8874 once HBASE-12493 is fixed -- Key: HIVE-8898 URL: https://issues.apache.org/jira/browse/HIVE-8898 Project: Hive Issue Type: Task Components: HBase Handler Reporter: Brock Noland Assignee: Yongzhi Chen Priority: Blocker Fix For: 1.2.0 Attachments: HIVE-8898.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout
[ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697453#comment-14697453 ] Alan Gates commented on HIVE-11317: --- +1 ACID: Improve transaction Abort logic due to timeout Key: HIVE-11317 URL: https://issues.apache.org/jira/browse/HIVE-11317 Project: Hive Issue Type: Bug Components: Metastore, Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Labels: triage Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.4.patch, HIVE-11317.patch the logic to Abort transactions that have stopped heartbeating is in TxnHandler.timeOutTxns() This is only called when DbTxnManger.getValidTxns() is called. So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system. Also, streaming api doesn't call DbTxnManager at all. Need to move this logic into Initiator (or some other metastore side thread). Also, make sure it is broken up into multiple small(er) transactions against metastore DB. Also more timeOutLocks() locks there as well. see about adding TXNS.COMMENT field which can be used for Auto aborted due to timeout for example. The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x
[ https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11304: - Attachment: HIVE-11304.11.patch For some reason templeton.cmd file did not apply cleanly on master but precommit test did not have that problem. Uploading the clean patch for future reference. Migrate to Log4j2 from Log4j 1.x Key: HIVE-11304 URL: https://issues.apache.org/jira/browse/HIVE-11304 Project: Hive Issue Type: Improvement Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11304.10.patch, HIVE-11304.11.patch, HIVE-11304.2.patch, HIVE-11304.3.patch, HIVE-11304.4.patch, HIVE-11304.5.patch, HIVE-11304.6.patch, HIVE-11304.7.patch, HIVE-11304.8.patch, HIVE-11304.9.patch, HIVE-11304.patch Log4J2 has some great benefits and can benefit hive significantly. Some notable features include 1) Performance (parametrized logging, performance when logging is disabled etc.) More details can be found here https://logging.apache.org/log4j/2.x/performance.html 2) RoutingAppender - Route logs to different log files based on MDC context (useful for HS2, LLAP etc.) 3) Asynchronous logging This is an umbrella jira to track changes related to Log4j2 migration. Log4J1 EOL - https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11558) Hive generates Parquet files with broken footers, causes NullPointerException in Spark / Drill / Parquet tools
[ https://issues.apache.org/jira/browse/HIVE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated HIVE-11558: --- Description: When creating a Parquet table in Hive from a table in another format (in this case JSON) using CTAS, the generated parquet files are created with broken footers and cause NullPointerExceptions in both Parquet tools and Spark when reading the files directly. Here is the error from parquet tools: {code}Could not read footer: java.lang.NullPointerException{code} Here is the error from Spark reading the parquet file back: {code}java.lang.NullPointerException at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249) at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543) at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297) at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56) at scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650) at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165) at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514) at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} What's interesting is that the table works fine in Hive when selecting out of it, even when doing select * on the whole table and letting it run to the end (it's a sample data set), it's only other tools it causes problems for. All fields are string exception for the first one which is timestamp, but this is not that known issue since if I create another table with 3 fields including the timestamp and two string fields it works fine in other tools. The only thing I can see which appears to cause this is the other fields have lots of NULLs in them as those json fields may or may not be present. I've converted this exact same json data set to parquet using Apache Drill and also using Apache SparkSQL and both of those tools create parquet files from this data set as a straight conversion that are fine when accessed via Parquet tools or Drill or Spark or Hive (using an external Hive table definition layered over the generated parquet files). This implies that it's Hive's generation of Parquet that is broken since both Drill and Spark can convert the dataset from JSON to Parquet without any issues on reading the files back in any of other tools. was: When creating a Parquet table in Hive from a table in another format (in this case JSON) using CTAS, the generated parquet files are created with broken footers and cause NullPointerExceptions in both Parquet tools and Spark when reading the files directly. Here is the error from parquet tools: {code}Could not read footer: java.lang.NullPointerException{code} Here is the error from Spark reading the parquet file back: {code}java.lang.NullPointerException at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249) at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543) at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297) at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658) at
[jira] [Commented] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout
[ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697438#comment-14697438 ] Eugene Koifman commented on HIVE-11317: --- patch 4 includes changes to tests such that they don't rely on timing and better comments. The reason for separate thread is modularity and testing. For example, if timed out transaction reaper is not keeping up it won't interfere with compaction scheduling and vs. It can also be configured separately and makes testing easier. I think HousekeeprService is a nice abstraction for later when we add alerting capability and perhaps an isAlive service for compaction processes. performTimeouts(): it's more efficient to read 2500 entries from TXNS than sending 25 queries and we can easily cache the result since it's just a list of longs. The rest of the logic runs each batch in a separate transaction to keep lock duration shorter - hopefully reduce the number of retries due to deadlocks. ACID: Improve transaction Abort logic due to timeout Key: HIVE-11317 URL: https://issues.apache.org/jira/browse/HIVE-11317 Project: Hive Issue Type: Bug Components: Metastore, Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Labels: triage Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.4.patch, HIVE-11317.patch the logic to Abort transactions that have stopped heartbeating is in TxnHandler.timeOutTxns() This is only called when DbTxnManger.getValidTxns() is called. So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system. Also, streaming api doesn't call DbTxnManager at all. Need to move this logic into Initiator (or some other metastore side thread). Also, make sure it is broken up into multiple small(er) transactions against metastore DB. Also more timeOutLocks() locks there as well. see about adding TXNS.COMMENT field which can be used for Auto aborted due to timeout for example. The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11557) CBO (Calcite Return Path): Convert to flat AND/OR
[ https://issues.apache.org/jira/browse/HIVE-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697437#comment-14697437 ] Ashutosh Chauhan commented on HIVE-11557: - +1 CBO (Calcite Return Path): Convert to flat AND/OR - Key: HIVE-11557 URL: https://issues.apache.org/jira/browse/HIVE-11557 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11557.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11560) Patch for branch-1
[ https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11560: Attachment: HIVE-11560.1.patch.txt Patch attached. Patch for branch-1 -- Key: HIVE-11560 URL: https://issues.apache.org/jira/browse/HIVE-11560 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11560.1.patch.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10289) Support filter on non-first partition key and non-string partition key
[ https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697497#comment-14697497 ] Daniel Dai commented on HIVE-10289: --- I am doing comparison in actual datatype. Hbase bytes are converted to actual datatype using BinarySortableSerDe. Do you mean Operator.val? That is string and means different thing according to operators it handles. For LIKE, it is the regex string. For NOTEQUALS, it is the value to compare against, and yes this can be optimized by converting to actual type at init time rather than in compareTo(). But that's operator dependent. Support filter on non-first partition key and non-string partition key -- Key: HIVE-10289 URL: https://issues.apache.org/jira/browse/HIVE-10289 Project: Hive Issue Type: Sub-task Components: HBase Metastore, Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-10289.1.patch, HIVE-10289.2.patch Currently, partition filtering only handles the first partition key and the type for this partition key must be string. In order to break this limitation, several improvements are required: 1. Change serialization format for partition key. Currently partition keys are serialized into delimited string, which sorted on string order not with regard to the actual type of the partition key. We use BinarySortableSerDe for this purpose. 2. For filter condition not on the initial partition keys, push it into HBase RowFilter. RowFilter will deserialize the partition key and evaluate the filter condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11562) Typo in hive-log4j2.xml throws unknown level exception
[ https://issues.apache.org/jira/browse/HIVE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11562: - Attachment: HIVE-11562.patch Typo in hive-log4j2.xml throws unknown level exception -- Key: HIVE-11562 URL: https://issues.apache.org/jira/browse/HIVE-11562 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11562.patch Noticing some typo in default hive-log4j2.xml used for tests causing the following exception {code} 2015-08-14 11:26:35,965 WARN Error while converting string [{sys:hive.log.level}] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [{SYS:HIVE.LOG.LEVEL}]. at org.apache.logging.log4j.Level.valueOf(Level.java:286) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:230) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:226) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:336) at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:130) at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45) at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:247) at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136) at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:766) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:706) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698) at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:358) at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:161) at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:361) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:426) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:442) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:138) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:147) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41) at org.apache.logging.log4j.LogManager.getContext(LogManager.java:175) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:102) at org.apache.logging.log4j.jcl.LogAdapter.getContext(LogAdapter.java:39) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42) at org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:40) at org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:55) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657) at org.apache.hadoop.util.ShutdownHookManager.clinit(ShutdownHookManager.java:44) at org.apache.hadoop.util.RunJar.run(RunJar.java:200) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11493) Predicate with integer column equals double evaluates to false
[ https://issues.apache.org/jira/browse/HIVE-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11493: Component/s: Query Planning Predicate with integer column equals double evaluates to false -- Key: HIVE-11493 URL: https://issues.apache.org/jira/browse/HIVE-11493 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Pengcheng Xiong Priority: Blocker Fix For: 2.0.0 Attachments: HIVE-11493.01.patch, HIVE-11493.02.patch, HIVE-11493.03.patch, HIVE-11493.04.patch Filters with integer column equals double constant evaluates to false everytime. Negative double constant works fine. {code:title=explain select * from orc_ppd where t = 10.0;} OK Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13] Filter Operator [FIL_1] predicate:false (type: boolean) TableScan [TS_0] alias:orc_ppd {code} {code:title=explain select * from orc_ppd where t = -10.0;} OK Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13] Filter Operator [FIL_1] predicate:(t = (- 10.0)) (type: boolean) TableScan [TS_0] alias:orc_ppd {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Attachment: (was: HIVE-11424.01.patch) Improve HivePreFilteringRule performance Key: HIVE-11424 URL: https://issues.apache.org/jira/browse/HIVE-11424 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11424.patch We create a rule that will transform OR clauses into IN clauses (when possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11558) Hive generates Parquet files with broken footers, causes NullPointerException in Spark / Drill / Parquet tools
[ https://issues.apache.org/jira/browse/HIVE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated HIVE-11558: --- Description: When creating a Parquet table in Hive from a table in another format (in this case JSON) using CTAS, the generated parquet files are created with broken footers and cause NullPointerExceptions in both Parquet tools and Spark when reading the files directly. Here is the error from parquet tools: {code}Could not read footer: java.lang.NullPointerException{code} Here is the error from Spark reading the parquet file back: {code}java.lang.NullPointerException at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249) at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543) at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297) at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56) at scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650) at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165) at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514) at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} What's interesting is that the table works fine in Hive when selecting out of it, even when doing select * on the whole table and letting it run to the end (it's a sample data set), it's only other tools it causes problems for. All fields are string except for the first one which is timestamp, but this is not that known issue since if I create another parquet table with 3 fields including the timestamp and two string fields using CTAS those hive generated parquet files works fine in the other tools. The only thing I can see which appears to cause this is the other fields have lots of NULLs in them as those json fields may or may not be present. I've converted this exact same json data set to parquet using Apache Drill and also using Apache Spark SQL and both of those tools create parquet files from this data set as a straight conversion that are fine when accessed via Parquet tools or Drill or Spark or Hive (using an external Hive table definition layered over the generated parquet files). This implies that it's Hive's generation of Parquet that is broken since both Drill and Spark can convert the dataset from JSON to Parquet without any issues on reading the files back in any of other tools. was: When creating a Parquet table in Hive from a table in another format (in this case JSON) using CTAS, the generated parquet files are created with broken footers and cause NullPointerExceptions in both Parquet tools and Spark when reading the files directly. Here is the error from parquet tools: {code}Could not read footer: java.lang.NullPointerException{code} Here is the error from Spark reading the parquet file back: {code}java.lang.NullPointerException at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249) at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543) at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297) at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658) at
[jira] [Commented] (HIVE-10276) Implement date_format(timestamp, fmt) UDF
[ https://issues.apache.org/jira/browse/HIVE-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697363#comment-14697363 ] Alexander Pivovarov commented on HIVE-10276: Hive documentation for date_format UDF clearly says - Supported formats are Java SimpleDateFormat formats https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions Feel free to submit patch for date_format_mysql UDF Implement date_format(timestamp, fmt) UDF - Key: HIVE-10276 URL: https://issues.apache.org/jira/browse/HIVE-10276 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Fix For: 1.2.0 Attachments: HIVE-10276.01.patch date_format(date/timestamp/string, fmt) converts a date/timestamp/string to a value of String in the format specified by the java date format fmt. Supported formats listed here: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11493) Predicate with integer column equals double evaluates to false
[ https://issues.apache.org/jira/browse/HIVE-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11493: Fix Version/s: 2.0.0 Predicate with integer column equals double evaluates to false -- Key: HIVE-11493 URL: https://issues.apache.org/jira/browse/HIVE-11493 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Pengcheng Xiong Priority: Blocker Fix For: 2.0.0 Attachments: HIVE-11493.01.patch, HIVE-11493.02.patch, HIVE-11493.03.patch, HIVE-11493.04.patch Filters with integer column equals double constant evaluates to false everytime. Negative double constant works fine. {code:title=explain select * from orc_ppd where t = 10.0;} OK Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13] Filter Operator [FIL_1] predicate:false (type: boolean) TableScan [TS_0] alias:orc_ppd {code} {code:title=explain select * from orc_ppd where t = -10.0;} OK Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13] Filter Operator [FIL_1] predicate:(t = (- 10.0)) (type: boolean) TableScan [TS_0] alias:orc_ppd {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Attachment: (was: HIVE-11424.01.patch) Improve HivePreFilteringRule performance Key: HIVE-11424 URL: https://issues.apache.org/jira/browse/HIVE-11424 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11424.patch We create a rule that will transform OR clauses into IN clauses (when possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x
[ https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11304: - Component/s: Logging Migrate to Log4j2 from Log4j 1.x Key: HIVE-11304 URL: https://issues.apache.org/jira/browse/HIVE-11304 Project: Hive Issue Type: Improvement Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11304.10.patch, HIVE-11304.2.patch, HIVE-11304.3.patch, HIVE-11304.4.patch, HIVE-11304.5.patch, HIVE-11304.6.patch, HIVE-11304.7.patch, HIVE-11304.8.patch, HIVE-11304.9.patch, HIVE-11304.patch Log4J2 has some great benefits and can benefit hive significantly. Some notable features include 1) Performance (parametrized logging, performance when logging is disabled etc.) More details can be found here https://logging.apache.org/log4j/2.x/performance.html 2) RoutingAppender - Route logs to different log files based on MDC context (useful for HS2, LLAP etc.) 3) Asynchronous logging This is an umbrella jira to track changes related to Log4j2 migration. Log4J1 EOL - https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11534) Improve validateTableCols error message
[ https://issues.apache.org/jira/browse/HIVE-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11534: --- Fix Version/s: 1.3.0 Improve validateTableCols error message --- Key: HIVE-11534 URL: https://issues.apache.org/jira/browse/HIVE-11534 Project: Hive Issue Type: Improvement Components: Hive Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Priority: Minor Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11534.patch For tables created without column definition in the DDL (but referencing the schema in the underlying file format like Avro), ObjectStore.validateTableCols throws an exception that doesn't include the table and db name. This makes it tedious to lookup table name in schema files. Example: {code} ERROR org.apache.hadoop.hive.metastore.ObjectStore: Error retrieving statistics via jdo MetaException(message:Column wpp_mbrshp_hix_ik doesn't exist.) at org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:6061) at org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6012) at org.apache.hadoop.hive.metastore.ObjectStore.access$1000(ObjectStore.java:160) at org.apache.hadoop.hive.metastore.ObjectStore$6.getJdoResult(ObjectStore.java:6084) at org.apache.hadoop.hive.metastore.ObjectStore$6.getJdoResult(ObjectStore.java:6076) {code} We should add database and the table name to the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11561) Patch for master
[ https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697482#comment-14697482 ] Swarnim Kulkarni commented on HIVE-11561: - I actually logged this one but seems like it's not really needed as we decided to keep forward the master to 1.x. So we would be keeping this NP change on master and then upgrading it to 1.x of HBase as part of [1]. Marking as won't fix. [1] https://issues.apache.org/jira/browse/HIVE-10491 Patch for master Key: HIVE-11561 URL: https://issues.apache.org/jira/browse/HIVE-11561 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11561) Patch for master
[ https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni resolved HIVE-11561. - Resolution: Won't Fix Patch for master Key: HIVE-11561 URL: https://issues.apache.org/jira/browse/HIVE-11561 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11558) Hive generates Parquet files with broken footers, causes NullPointerException in Spark / Drill / Parquet tools
[ https://issues.apache.org/jira/browse/HIVE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated HIVE-11558: --- Description: When creating a Parquet table in Hive from a table in another format (in this case JSON) using CTAS, the generated parquet files are created with broken footers and cause NullPointerExceptions in both Parquet tools and Spark when reading the files directly. Here is the error from parquet tools: {code}Could not read footer: java.lang.NullPointerException{code} Here is the error from Spark reading the parquet file back: {code}java.lang.NullPointerException at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249) at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543) at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297) at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:54) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:53) at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:56) at scala.collection.parallel.mutable.ParArray$Map.tryLeaf(ParArray.scala:650) at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:165) at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514) at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} What's interesting is that the table works fine in Hive when selecting out of it, even when doing select * on the whole table and letting it run to the end (it's a sample data set), it's only other tools it causes problems for. All fields are string except for the first one which is timestamp, but this is not that known issue since if I create another parquet table with 3 fields including the timestamp and two string fields using CTAS those hive generated parquet files works fine in the other tools. The only thing I can see which appears to cause this is the other fields have lots of NULLs in them as those json fields may or may not be present. I've converted this exact same json data set to parquet using Apache Drill and also using Apache Spark SQL and both of those tools create parquet files from this data set as a straight conversion that are fine when accessed via Parquet tools or Drill or Spark or Hive (using an external Hive table definition layered over the generated parquet files). This implies that it's Hive's generation of Parquet that is broken since both Drill and Spark can convert the dataset from JSON to Parquet without any issues on reading the files back in any of the other mentioned tools. was: When creating a Parquet table in Hive from a table in another format (in this case JSON) using CTAS, the generated parquet files are created with broken footers and cause NullPointerExceptions in both Parquet tools and Spark when reading the files directly. Here is the error from parquet tools: {code}Could not read footer: java.lang.NullPointerException{code} Here is the error from Spark reading the parquet file back: {code}java.lang.NullPointerException at parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:249) at parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:543) at parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:520) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:426) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297) at scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658) at
[jira] [Updated] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout
[ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11317: -- Attachment: HIVE-11317.4.patch ACID: Improve transaction Abort logic due to timeout Key: HIVE-11317 URL: https://issues.apache.org/jira/browse/HIVE-11317 Project: Hive Issue Type: Bug Components: Metastore, Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Labels: triage Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.4.patch, HIVE-11317.patch the logic to Abort transactions that have stopped heartbeating is in TxnHandler.timeOutTxns() This is only called when DbTxnManger.getValidTxns() is called. So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system. Also, streaming api doesn't call DbTxnManager at all. Need to move this logic into Initiator (or some other metastore side thread). Also, make sure it is broken up into multiple small(er) transactions against metastore DB. Also more timeOutLocks() locks there as well. see about adding TXNS.COMMENT field which can be used for Auto aborted due to timeout for example. The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11559) Revert work done in HIVE-8898
[ https://issues.apache.org/jira/browse/HIVE-11559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11559: --- Assignee: Swarnim Kulkarni Revert work done in HIVE-8898 - Key: HIVE-11559 URL: https://issues.apache.org/jira/browse/HIVE-11559 Project: Hive Issue Type: Bug Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni We unfortunately need to revert the work done in HIVE-8898 as it is non-passive with the older hbase versions. We need to revert this from branch-1 and commit this onto master to maintain passivity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11560) Patch for branch-1
[ https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11560: --- Assignee: Swarnim Kulkarni Patch for branch-1 -- Key: HIVE-11560 URL: https://issues.apache.org/jira/browse/HIVE-11560 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11561) Patch for master
[ https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11561: --- Assignee: Swarnim Kulkarni Patch for master Key: HIVE-11561 URL: https://issues.apache.org/jira/browse/HIVE-11561 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11560) Patch for branch-1
[ https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697466#comment-14697466 ] Swarnim Kulkarni commented on HIVE-11560: - [~sershe] Mind reviewing this for me? Patch for branch-1 -- Key: HIVE-11560 URL: https://issues.apache.org/jira/browse/HIVE-11560 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11560.1.patch.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Attachment: HIVE-11424.01.patch Improve HivePreFilteringRule performance Key: HIVE-11424 URL: https://issues.apache.org/jira/browse/HIVE-11424 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11424.01.patch, HIVE-11424.patch We create a rule that will transform OR clauses into IN clauses (when possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11278) Partition.setOutputFormatClass should not do toString for Class object
[ https://issues.apache.org/jira/browse/HIVE-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697512#comment-14697512 ] Ashutosh Chauhan commented on HIVE-11278: - [~prongs] Will it be possible to add a test case which fails in absence of this patch? Partition.setOutputFormatClass should not do toString for Class object --- Key: HIVE-11278 URL: https://issues.apache.org/jira/browse/HIVE-11278 Project: Hive Issue Type: Bug Reporter: Rajat Khandelwal Assignee: Rajat Khandelwal Fix For: 2.0.0 Attachments: HIVE-11278.01.patch https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java#L286 inside setInputFormatClass, we're doing: {noformat} public void setInputFormatClass(Class? extends InputFormat inputFormatClass) { this.inputFormatClass = inputFormatClass; tPartition.getSd().setInputFormat(inputFormatClass.getName()); } {noformat} But inside setOutputFormatClass, we're doing toString for class, instead of getName(). {noformat} public void setOutputFormatClass(Class? extends HiveOutputFormat outputFormatClass) { this.outputFormatClass = outputFormatClass; tPartition.getSd().setOutputFormat(HiveFileFormatUtils .getOutputFormatSubstitute(outputFormatClass).toString()); } {noformat} Difference is that, for Class A.class, toString is class A.class, getName is A.class. So Class.forName(cls.getName()) succeeds, but Class.forName(cls.toString()) is not valid. So if you get a partition, set outputformat, and make an alter call, then get the partition again and make a getOutputFormatClass call on that object, it throws a ClassNotFoundException on https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java#L316, because it's basically calling Class.forName(class a.b.c.ClassName.class) which is wrong! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10289) Support filter on non-first partition key and non-string partition key
[ https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697356#comment-14697356 ] Alan Gates commented on HIVE-10289: --- I'm not done reviewing the patch yet, but I have one bigger question: it looks like we're doing the comparison in String format (in PartitionKeyComparator.compareTo we convert the byte[] value passed in into a String and the values passed in the protobuf are string). Why pay the cost of the string conversion? Why not leave it in byte[] and use bytes in the protobuf? This seems like it would be faster since this filter will be applied to every row in the scan range. [~sershe], I think the value of ObjectInspectors over OrderedBytes is that there's guaranteed to be an ObjectInspector for every Hive type, whereas there are some Hive types not covered by OrderedBytes (e.g. Date, Timestamp). Support filter on non-first partition key and non-string partition key -- Key: HIVE-10289 URL: https://issues.apache.org/jira/browse/HIVE-10289 Project: Hive Issue Type: Sub-task Components: HBase Metastore, Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Daniel Dai Attachments: HIVE-10289.1.patch, HIVE-10289.2.patch Currently, partition filtering only handles the first partition key and the type for this partition key must be string. In order to break this limitation, several improvements are required: 1. Change serialization format for partition key. Currently partition keys are serialized into delimited string, which sorted on string order not with regard to the actual type of the partition key. We use BinarySortableSerDe for this purpose. 2. For filter condition not on the initial partition keys, push it into HBase RowFilter. RowFilter will deserialize the partition key and evaluate the filter condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697375#comment-14697375 ] Ashutosh Chauhan commented on HIVE-10631: - Getting this: You don't have access to this review request. This review request is private. You must be a requested reviewer, either directly or on a requested group, and have permission to access the repository in order to view this review request. create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Description: We create a rule that will transform OR clauses into IN clauses (when possible). (was: 1) Remove early bail out condition. 2) Create IN clause instead of OR tree (when possible).) Improve HivePreFilteringRule performance Key: HIVE-11424 URL: https://issues.apache.org/jira/browse/HIVE-11424 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11424.patch We create a rule that will transform OR clauses into IN clauses (when possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11534) Improve validateTableCols error message
[ https://issues.apache.org/jira/browse/HIVE-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697442#comment-14697442 ] Xuefu Zhang commented on HIVE-11534: Also pushed to branch-1.0. Improve validateTableCols error message --- Key: HIVE-11534 URL: https://issues.apache.org/jira/browse/HIVE-11534 Project: Hive Issue Type: Improvement Components: Hive Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Priority: Minor Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11534.patch For tables created without column definition in the DDL (but referencing the schema in the underlying file format like Avro), ObjectStore.validateTableCols throws an exception that doesn't include the table and db name. This makes it tedious to lookup table name in schema files. Example: {code} ERROR org.apache.hadoop.hive.metastore.ObjectStore: Error retrieving statistics via jdo MetaException(message:Column wpp_mbrshp_hix_ik doesn't exist.) at org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:6061) at org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6012) at org.apache.hadoop.hive.metastore.ObjectStore.access$1000(ObjectStore.java:160) at org.apache.hadoop.hive.metastore.ObjectStore$6.getJdoResult(ObjectStore.java:6084) at org.apache.hadoop.hive.metastore.ObjectStore$6.getJdoResult(ObjectStore.java:6076) {code} We should add database and the table name to the error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Attachment: HIVE-11424.01.patch Improve HivePreFilteringRule performance Key: HIVE-11424 URL: https://issues.apache.org/jira/browse/HIVE-11424 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11424.01.patch, HIVE-11424.patch We create a rule that will transform OR clauses into IN clauses (when possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11557) CBO (Calcite Return Path): Convert to flat AND/OR
[ https://issues.apache.org/jira/browse/HIVE-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697424#comment-14697424 ] Hive QA commented on HIVE-11557: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750503/HIVE-11557.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9320 tests executed *Failed tests:* {noformat} TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4967/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4967/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4967/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12750503 - PreCommit-HIVE-TRUNK-Build CBO (Calcite Return Path): Convert to flat AND/OR - Key: HIVE-11557 URL: https://issues.apache.org/jira/browse/HIVE-11557 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11557.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Improve HivePreFilteringRule performance
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Attachment: HIVE-11424.01.patch Improve HivePreFilteringRule performance Key: HIVE-11424 URL: https://issues.apache.org/jira/browse/HIVE-11424 Project: Hive Issue Type: Bug Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11424.01.patch, HIVE-11424.patch We create a rule that will transform OR clauses into IN clauses (when possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11490) Lazily call ASTNode::toStringTree() after tree modification
[ https://issues.apache.org/jira/browse/HIVE-11490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697767#comment-14697767 ] Ashutosh Chauhan commented on HIVE-11490: - +1 Lazily call ASTNode::toStringTree() after tree modification --- Key: HIVE-11490 URL: https://issues.apache.org/jira/browse/HIVE-11490 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11490.1.patch, HIVE-11490.2.patch, HIVE-11490.3.patch Currently, we call toStringTree() as part of HIVE-11316 everytime the tree is modified. This is a bad approach as we can lazily delay this to the point when toStringTree() is called again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7152) OutputJobInfo.setPosOfPartCols() Comparator bug
[ https://issues.apache.org/jira/browse/HIVE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7152: - Assignee: (was: Eugene Koifman) OutputJobInfo.setPosOfPartCols() Comparator bug --- Key: HIVE-7152 URL: https://issues.apache.org/jira/browse/HIVE-7152 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Eugene Koifman this method compares Integer objects using '=='. This may break for wide tables that have more than 127 columns. http://stackoverflow.com/questions/2602636/why-cant-the-compiler-jvm-just-make-autoboxing-just-work -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11570) Fix PTest2 log4j2.version
[ https://issues.apache.org/jira/browse/HIVE-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V resolved HIVE-11570. Resolution: Fixed Fix PTest2 log4j2.version -- Key: HIVE-11570 URL: https://issues.apache.org/jira/browse/HIVE-11570 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11570.1.patch {code} + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2 [INFO] Scanning for projects... [ERROR] The build could not read 1 project - [Help 1] [ERROR] [ERROR] The project org.apache.hive:hive-ptest:1.0 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml) has 4 errors [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is '${log4j2.version}'. @ line 69, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-web:jar must be a valid version but is '${log4j2.version}'. @ line 74, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is '${log4j2.version}'. @ line 79, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is '${log4j2.version}'. @ line 84, column 16 [ERROR] {code} NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10497) Upgrade hive branch to latest Tez
[ https://issues.apache.org/jira/browse/HIVE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10497: --- Attachment: (was: HIVE-10497.3.patch) Upgrade hive branch to latest Tez - Key: HIVE-10497 URL: https://issues.apache.org/jira/browse/HIVE-10497 Project: Hive Issue Type: Improvement Components: Tez Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-10497.1.patch, HIVE-10497.1.patch, HIVE-10497.2.patch Upgrade hive to the upcoming tez-0.7 release -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11563) Perflogger loglines are repeated
[ https://issues.apache.org/jira/browse/HIVE-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-11563. -- Resolution: Fixed Perflogger loglines are repeated Key: HIVE-11563 URL: https://issues.apache.org/jira/browse/HIVE-11563 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11563.patch After HIVE-11304, the perflogger log lines in qtests are repeated. {code} 2015-08-14T12:02:05,765 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,765 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11542) port fileId support on shims and splits from llap branch
[ https://issues.apache.org/jira/browse/HIVE-11542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697762#comment-14697762 ] Prasanth Jayachandran commented on HIVE-11542: -- I don't see HdfsUtils class being used anywhere. Remove it? Otherwise looks good to me +1 port fileId support on shims and splits from llap branch Key: HIVE-11542 URL: https://issues.apache.org/jira/browse/HIVE-11542 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch, 2.0.0 Attachments: HIVE-11542.patch This is helpful for any kind of file-based cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11552: Attachment: HIVE-11552.nogen.patch Updated nogen patch to remove some spurious changes implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11570) Fix PTest2 log4j2.version
[ https://issues.apache.org/jira/browse/HIVE-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11570: --- Description: {code} + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2 [INFO] Scanning for projects... [ERROR] The build could not read 1 project - [Help 1] [ERROR] [ERROR] The project org.apache.hive:hive-ptest:1.0 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml) has 4 errors [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is '${log4j2.version}'. @ line 69, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-web:jar must be a valid version but is '${log4j2.version}'. @ line 74, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is '${log4j2.version}'. @ line 79, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is '${log4j2.version}'. @ line 84, column 16 [ERROR] {code} NO PRECOMMIT TESTS was: {code} + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2 [INFO] Scanning for projects... [ERROR] The build could not read 1 project - [Help 1] [ERROR] [ERROR] The project org.apache.hive:hive-ptest:1.0 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml) has 4 errors [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is '${log4j2.version}'. @ line 69, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-web:jar must be a valid version but is '${log4j2.version}'. @ line 74, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is '${log4j2.version}'. @ line 79, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is '${log4j2.version}'. @ line 84, column 16 [ERROR] {code} Fix PTest2 log4j2.version -- Key: HIVE-11570 URL: https://issues.apache.org/jira/browse/HIVE-11570 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11570.1.patch {code} + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2 [INFO] Scanning for projects... [ERROR] The build could not read 1 project - [Help 1] [ERROR] [ERROR] The project org.apache.hive:hive-ptest:1.0 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml) has 4 errors [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is '${log4j2.version}'. @ line 69, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-web:jar must be a valid version but is '${log4j2.version}'. @ line 74, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is '${log4j2.version}'. @ line 79, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is '${log4j2.version}'. @ line 84, column 16 [ERROR] {code} NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697900#comment-14697900 ] Sushanth Sowmyan commented on HIVE-11552: - +cc [~thejas] implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11567) Some trace logs seeped through with new log4j2 changes
[ https://issues.apache.org/jira/browse/HIVE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697746#comment-14697746 ] Prasanth Jayachandran commented on HIVE-11567: -- I don't think this needs a precommit test run as it just reduces the log lines in hive.log when running tests. Some trace logs seeped through with new log4j2 changes -- Key: HIVE-11567 URL: https://issues.apache.org/jira/browse/HIVE-11567 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11567.patch Observed hive.log file size difference when running with new log4j2 changes (HIVE-11304). Looks like the default threshold was DEBUG in log4j1.x (as log4j.threshold was misspelt). In log4j2 the default threshold was set to ALL which emitted some trace logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11562) Typo in hive-log4j2.xml throws unknown level exception
[ https://issues.apache.org/jira/browse/HIVE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697744#comment-14697744 ] Prasanth Jayachandran commented on HIVE-11562: -- I don't think this needs a precommit test run as it just avoids errors written to console wrt initialization. Typo in hive-log4j2.xml throws unknown level exception -- Key: HIVE-11562 URL: https://issues.apache.org/jira/browse/HIVE-11562 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11562.patch Noticing some typo in default hive-log4j2.xml used for tests causing the following exception {code} 2015-08-14 11:26:35,965 WARN Error while converting string [{sys:hive.log.level}] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [{SYS:HIVE.LOG.LEVEL}]. at org.apache.logging.log4j.Level.valueOf(Level.java:286) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:230) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:226) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:336) at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:130) at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45) at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:247) at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136) at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:766) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:706) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698) at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:358) at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:161) at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:361) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:426) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:442) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:138) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:147) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41) at org.apache.logging.log4j.LogManager.getContext(LogManager.java:175) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:102) at org.apache.logging.log4j.jcl.LogAdapter.getContext(LogAdapter.java:39) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42) at org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:40) at org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:55) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657) at org.apache.hadoop.util.ShutdownHookManager.clinit(ShutdownHookManager.java:44) at org.apache.hadoop.util.RunJar.run(RunJar.java:200) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11563) Perflogger loglines are repeated
[ https://issues.apache.org/jira/browse/HIVE-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697745#comment-14697745 ] Prasanth Jayachandran commented on HIVE-11563: -- I don't think this needs a precommit test run as it just reduces the log lines in hive.log when running tests. Perflogger loglines are repeated Key: HIVE-11563 URL: https://issues.apache.org/jira/browse/HIVE-11563 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11563.patch After HIVE-11304, the perflogger log lines in qtests are repeated. {code} 2015-08-14T12:02:05,765 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,765 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11341) Avoid expensive resizing of ASTNode tree
[ https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697770#comment-14697770 ] Ashutosh Chauhan commented on HIVE-11341: - [~hsubramaniyan] Are above failures not related to patch or are you working on fixing those? Avoid expensive resizing of ASTNode tree - Key: HIVE-11341 URL: https://issues.apache.org/jira/browse/HIVE-11341 Project: Hive Issue Type: Bug Components: Hive, Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11341.1.patch, HIVE-11341.2.patch, HIVE-11341.3.patch, HIVE-11341.4.patch, HIVE-11341.5.patch, HIVE-11341.6.patch, HIVE-11341.7.patch {code} Stack TraceSample CountPercentage(%) parse.BaseSemanticAnalyzer.analyze(ASTNode, Context) 1,605 90 parse.CalcitePlanner.analyzeInternal(ASTNode) 1,605 90 parse.SemanticAnalyzer.analyzeInternal(ASTNode, SemanticAnalyzer$PlannerContext) 1,605 90 parse.CalcitePlanner.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) 1,604 90 parse.SemanticAnalyzer.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) 1,604 90 parse.SemanticAnalyzer.genPlan(QB) 1,604 90 parse.SemanticAnalyzer.genPlan(QB, boolean) 1,604 90 parse.SemanticAnalyzer.genBodyPlan(QB, Operator, Map) 1,604 90 parse.SemanticAnalyzer.genFilterPlan(ASTNode, QB, Operator, Map, boolean) 1,603 90 parse.SemanticAnalyzer.genFilterPlan(QB, ASTNode, Operator, boolean)1,603 90 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, boolean)1,603 90 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 1,603 90 parse.SemanticAnalyzer.genAllExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 1,603 90 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx) 1,603 90 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx, TypeCheckProcFactory) 1,603 90 lib.DefaultGraphWalker.startWalking(Collection, HashMap) 1,579 89 lib.DefaultGraphWalker.walk(Node) 1,571 89 java.util.ArrayList.removeAll(Collection) 1,433 81 java.util.ArrayList.batchRemove(Collection, boolean) 1,433 81 java.util.ArrayList.contains(Object) 1,228 69 java.util.ArrayList.indexOf(Object)1,228 69 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11570) Fix PTest2 log4j2.version
[ https://issues.apache.org/jira/browse/HIVE-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697857#comment-14697857 ] Sergey Shelukhin commented on HIVE-11570: - I wonder if anything needs to be updated for this [~spena] Fix PTest2 log4j2.version -- Key: HIVE-11570 URL: https://issues.apache.org/jira/browse/HIVE-11570 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11570.1.patch {code} + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2 [INFO] Scanning for projects... [ERROR] The build could not read 1 project - [Help 1] [ERROR] [ERROR] The project org.apache.hive:hive-ptest:1.0 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml) has 4 errors [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is '${log4j2.version}'. @ line 69, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-web:jar must be a valid version but is '${log4j2.version}'. @ line 74, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is '${log4j2.version}'. @ line 79, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is '${log4j2.version}'. @ line 84, column 16 [ERROR] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11570) Fix PTest2 log4j2.version
[ https://issues.apache.org/jira/browse/HIVE-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697856#comment-14697856 ] Sergey Shelukhin commented on HIVE-11570: - +1 Fix PTest2 log4j2.version -- Key: HIVE-11570 URL: https://issues.apache.org/jira/browse/HIVE-11570 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11570.1.patch {code} + mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 -Dmaven.repo.local=/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/.m2 [INFO] Scanning for projects... [ERROR] The build could not read 1 project - [Help 1] [ERROR] [ERROR] The project org.apache.hive:hive-ptest:1.0 (/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/pom.xml) has 4 errors [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-1.2-api:jar must be a valid version but is '${log4j2.version}'. @ line 69, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-web:jar must be a valid version but is '${log4j2.version}'. @ line 74, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-slf4j-impl:jar must be a valid version but is '${log4j2.version}'. @ line 79, column 16 [ERROR] 'dependencies.dependency.version' for org.apache.logging.log4j:log4j-jcl:jar must be a valid version but is '${log4j2.version}'. @ line 84, column 16 [ERROR] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697934#comment-14697934 ] Alan Gates commented on HIVE-11552: --- I can review it. implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11563) Perflogger loglines are repeated
[ https://issues.apache.org/jira/browse/HIVE-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11563: - Attachment: HIVE-11563.patch Perflogger loglines are repeated Key: HIVE-11563 URL: https://issues.apache.org/jira/browse/HIVE-11563 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11563.patch After HIVE-11304, the perflogger log lines in qtests are repeated. {code} 2015-08-14T12:02:05,765 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,765 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11341) Avoid expensive resizing of ASTNode tree
[ https://issues.apache.org/jira/browse/HIVE-11341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697773#comment-14697773 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11341: -- [~ashutoshc] I am working on fixing them as I can reproduce them locally. Thanks Hari Avoid expensive resizing of ASTNode tree - Key: HIVE-11341 URL: https://issues.apache.org/jira/browse/HIVE-11341 Project: Hive Issue Type: Bug Components: Hive, Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-11341.1.patch, HIVE-11341.2.patch, HIVE-11341.3.patch, HIVE-11341.4.patch, HIVE-11341.5.patch, HIVE-11341.6.patch, HIVE-11341.7.patch {code} Stack TraceSample CountPercentage(%) parse.BaseSemanticAnalyzer.analyze(ASTNode, Context) 1,605 90 parse.CalcitePlanner.analyzeInternal(ASTNode) 1,605 90 parse.SemanticAnalyzer.analyzeInternal(ASTNode, SemanticAnalyzer$PlannerContext) 1,605 90 parse.CalcitePlanner.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) 1,604 90 parse.SemanticAnalyzer.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) 1,604 90 parse.SemanticAnalyzer.genPlan(QB) 1,604 90 parse.SemanticAnalyzer.genPlan(QB, boolean) 1,604 90 parse.SemanticAnalyzer.genBodyPlan(QB, Operator, Map) 1,604 90 parse.SemanticAnalyzer.genFilterPlan(ASTNode, QB, Operator, Map, boolean) 1,603 90 parse.SemanticAnalyzer.genFilterPlan(QB, ASTNode, Operator, boolean)1,603 90 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, boolean)1,603 90 parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 1,603 90 parse.SemanticAnalyzer.genAllExprNodeDesc(ASTNode, RowResolver, TypeCheckCtx) 1,603 90 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx) 1,603 90 parse.TypeCheckProcFactory.genExprNode(ASTNode, TypeCheckCtx, TypeCheckProcFactory) 1,603 90 lib.DefaultGraphWalker.startWalking(Collection, HashMap) 1,579 89 lib.DefaultGraphWalker.walk(Node) 1,571 89 java.util.ArrayList.removeAll(Collection) 1,433 81 java.util.ArrayList.batchRemove(Collection, boolean) 1,433 81 java.util.ArrayList.contains(Object) 1,228 69 java.util.ArrayList.indexOf(Object)1,228 69 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10497) Upgrade hive branch to latest Tez
[ https://issues.apache.org/jira/browse/HIVE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10497: --- Attachment: (was: HIVE-10497.3.patch) Upgrade hive branch to latest Tez - Key: HIVE-10497 URL: https://issues.apache.org/jira/browse/HIVE-10497 Project: Hive Issue Type: Improvement Components: Tez Affects Versions: 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-10497.1.patch, HIVE-10497.1.patch, HIVE-10497.2.patch Upgrade hive to the upcoming tez-0.7 release -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11568) merge master into branch
[ https://issues.apache.org/jira/browse/HIVE-11568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11568: Description: NO PRECOMMIT TESTS merge master into branch Key: HIVE-11568 URL: https://issues.apache.org/jira/browse/HIVE-11568 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11568.nogen.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11571) Fix Hive PTest2 logging configuration
[ https://issues.apache.org/jira/browse/HIVE-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697879#comment-14697879 ] Sergey Shelukhin commented on HIVE-11571: - +1 Fix Hive PTest2 logging configuration - Key: HIVE-11571 URL: https://issues.apache.org/jira/browse/HIVE-11571 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Affects Versions: 2.0.0 Reporter: Gopal V Assignee: Gopal V Priority: Trivial Fix For: 2.0.0 Attachments: HIVE-11571.patch {code} [Fatal Error] log4j2.xml:79:3: The element type Loggers must be terminated by the matching end-tag /Loggers. ERROR StatusLogger Error parsing jar:file:/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/target/hive-ptest-1.0-classes.jar!/log4j2.xml org.xml.sax.SAXParseException; systemId: jar:file:/var/lib/jenkins/jobs/PreCommit-HIVE-TRUNK-Build/workspace/hive/build/hive/testutils/ptest2/target/hive-ptest-1.0-classes.jar!/log4j2.xml; lineNumber: 79; columnNumber: 3; The element type Loggers must be terminated by the matching end-tag /Loggers. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) {code} NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697905#comment-14697905 ] Sergey Shelukhin commented on HIVE-11552: - is cc equivalent to 1 in some encoding? :) implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11568) merge master into branch
[ https://issues.apache.org/jira/browse/HIVE-11568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697911#comment-14697911 ] Alan Gates commented on HIVE-11568: --- +1, looks like all the relevant changes are outside of hbase metastore code. merge master into branch Key: HIVE-11568 URL: https://issues.apache.org/jira/browse/HIVE-11568 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11568.nogen.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11563) Perflogger loglines are repeated
[ https://issues.apache.org/jira/browse/HIVE-11563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11563: +1 Perflogger loglines are repeated Key: HIVE-11563 URL: https://issues.apache.org/jira/browse/HIVE-11563 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11563.patch After HIVE-11304, the perflogger log lines in qtests are repeated. {code} 2015-08-14T12:02:05,765 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,765 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver 2015-08-14T12:02:05,766 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(120)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11562) Typo in hive-log4j2.xml throws unknown level exception
[ https://issues.apache.org/jira/browse/HIVE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11562: +1 Typo in hive-log4j2.xml throws unknown level exception -- Key: HIVE-11562 URL: https://issues.apache.org/jira/browse/HIVE-11562 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11562.patch Noticing some typo in default hive-log4j2.xml used for tests causing the following exception {code} 2015-08-14 11:26:35,965 WARN Error while converting string [{sys:hive.log.level}] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [{SYS:HIVE.LOG.LEVEL}]. at org.apache.logging.log4j.Level.valueOf(Level.java:286) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:230) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters$LevelConverter.convert(TypeConverters.java:226) at org.apache.logging.log4j.core.config.plugins.convert.TypeConverters.convert(TypeConverters.java:336) at org.apache.logging.log4j.core.config.plugins.visitors.AbstractPluginVisitor.convert(AbstractPluginVisitor.java:130) at org.apache.logging.log4j.core.config.plugins.visitors.PluginAttributeVisitor.visit(PluginAttributeVisitor.java:45) at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.generateParameters(PluginBuilder.java:247) at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136) at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:766) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:706) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698) at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:698) at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:358) at org.apache.logging.log4j.core.config.AbstractConfiguration.start(AbstractConfiguration.java:161) at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:361) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:426) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:442) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:138) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:147) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41) at org.apache.logging.log4j.LogManager.getContext(LogManager.java:175) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:102) at org.apache.logging.log4j.jcl.LogAdapter.getContext(LogAdapter.java:39) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42) at org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:40) at org.apache.logging.log4j.jcl.LogFactoryImpl.getInstance(LogFactoryImpl.java:55) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657) at org.apache.hadoop.util.ShutdownHookManager.clinit(ShutdownHookManager.java:44) at org.apache.hadoop.util.RunJar.run(RunJar.java:200) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11567) Some trace logs seeped through with new log4j2 changes
[ https://issues.apache.org/jira/browse/HIVE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11567: +1 Some trace logs seeped through with new log4j2 changes -- Key: HIVE-11567 URL: https://issues.apache.org/jira/browse/HIVE-11567 Project: Hive Issue Type: Sub-task Components: Logging Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 2.0.0 Attachments: HIVE-11567.patch Observed hive.log file size difference when running with new log4j2 changes (HIVE-11304). Looks like the default threshold was DEBUG in log4j1.x (as log4j.threshold was misspelt). In log4j2 the default threshold was set to ALL which emitted some trace logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11317) ACID: Improve transaction Abort logic due to timeout
[ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11317: -- Attachment: HIVE-11317.5.patch one final tweak: don't start housekeeper unless hive.compactor.initiator.on=true ACID: Improve transaction Abort logic due to timeout Key: HIVE-11317 URL: https://issues.apache.org/jira/browse/HIVE-11317 Project: Hive Issue Type: Bug Components: Metastore, Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Labels: triage Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, HIVE-11317.4.patch, HIVE-11317.5.patch, HIVE-11317.patch the logic to Abort transactions that have stopped heartbeating is in TxnHandler.timeOutTxns() This is only called when DbTxnManger.getValidTxns() is called. So if there is a lot of txns that need to be timed out and the there are not SQL clients talking to the system, there is nothing to abort dead transactions, and thus compaction can't clean them up so garbage accumulates in the system. Also, streaming api doesn't call DbTxnManager at all. Need to move this logic into Initiator (or some other metastore side thread). Also, make sure it is broken up into multiple small(er) transactions against metastore DB. Also more timeOutLocks() locks there as well. see about adding TXNS.COMMENT field which can be used for Auto aborted due to timeout for example. The symptom of this is that the system keeps showing more and more Open transactions that don't seem to ever go away (and have no locks associated with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)