[jira] [Commented] (HIVE-7237) hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever
[ https://issues.apache.org/jira/browse/HIVE-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040459#comment-14040459 ] Hive QA commented on HIVE-7237: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651909/HIVE-7237.2.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5669 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/556/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/556/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-556/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651909 hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever - Key: HIVE-7237 URL: https://issues.apache.org/jira/browse/HIVE-7237 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.13.0 Environment: HDP 2.1, Hive 0.13, SLES 11, 128GB data nodes, ORC SNAPPY Reporter: Douglas Moore Assignee: Navis Attachments: HIVE-7237.1.patch.txt, HIVE-7237.2.patch.txt set hive.exec.parallel=true; will cause the Yarn application instance to linger forever. set hive.exec.parallel=false, the application goes away as soon as hive query is complete. The underlying table is an ORC store_sales table compressed with SNAPPY. {code} hive.exec.parallel=true; select * from store_sales where ss_ticket_number=5741230 and ss_item_sk=4825 {code} The query will run under Tez and finish 30 seconds. After 30-40 of these jobs the cluster gets to a point where no jobs will finish. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
[ https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DJ Choi updated HIVE-7172: -- Attachment: HIVE-7172.patch Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion() - Key: HIVE-7172 URL: https://issues.apache.org/jira/browse/HIVE-7172 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7172.patch {code} ResultSet res = stmt.executeQuery(versionQuery); if (!res.next()) { throw new HiveMetaException(Didn't find version data in metastore); } String currentSchemaVersion = res.getString(1); metastoreConn.close(); {code} When HiveMetaException is thrown, metastoreConn.close() would be skipped. stmt is not closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
[ https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DJ Choi updated HIVE-7172: -- Attachment: (was: HIVE-7172.patch) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion() - Key: HIVE-7172 URL: https://issues.apache.org/jira/browse/HIVE-7172 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7172.patch {code} ResultSet res = stmt.executeQuery(versionQuery); if (!res.next()) { throw new HiveMetaException(Didn't find version data in metastore); } String currentSchemaVersion = res.getString(1); metastoreConn.close(); {code} When HiveMetaException is thrown, metastoreConn.close() would be skipped. stmt is not closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()
[ https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KangHS updated HIVE-7229: - Attachment: (was: HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init() - Key: HIVE-7229 URL: https://issues.apache.org/jira/browse/HIVE-7229 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Around line 423: {code} if (partitionValidationRegex != null partitionValidationRegex != ) { partitionValidationPattern = Pattern.compile(partitionValidationRegex); {code} partitionValidationRegex.isEmpty() can be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()
[ https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KangHS updated HIVE-7229: - Attachment: HIVE-7229.patch The String compare opertation Change String is compared using equal in HiveMetaStore#HMSHandler#init() - Key: HIVE-7229 URL: https://issues.apache.org/jira/browse/HIVE-7229 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7229.patch Around line 423: {code} if (partitionValidationRegex != null partitionValidationRegex != ) { partitionValidationPattern = Pattern.compile(partitionValidationRegex); {code} partitionValidationRegex.isEmpty() can be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-5074) Additional information for mini-mr tests
[ https://issues.apache.org/jira/browse/HIVE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis resolved HIVE-5074. - Resolution: Won't Fix Seemed enough to look into hive.log Additional information for mini-mr tests Key: HIVE-5074 URL: https://issues.apache.org/jira/browse/HIVE-5074 Project: Hive Issue Type: Test Components: Tests Reporter: Navis Assignee: Navis Priority: Trivial Flaky tests of Test(Negative)MinimrCliDriver is hard to track. Test results for diff error and exception trace for unexpected exception would be helpful for debugging. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho
Bravo, Szehon and Gopal! -- Lefty On Mon, Jun 23, 2014 at 12:53 AM, Gopal V gop...@apache.org wrote: On 6/22/14, 8:42 PM, Carl Steinbach wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Thanks everyone! And congrats Szehon! Cheers, Gopal
[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
[ https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DJ Choi updated HIVE-7172: -- Attachment: HIVE-7172.patch Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion() - Key: HIVE-7172 URL: https://issues.apache.org/jira/browse/HIVE-7172 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7172.patch {code} ResultSet res = stmt.executeQuery(versionQuery); if (!res.next()) { throw new HiveMetaException(Didn't find version data in metastore); } String currentSchemaVersion = res.getString(1); metastoreConn.close(); {code} When HiveMetaException is thrown, metastoreConn.close() would be skipped. stmt is not closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
[ https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DJ Choi updated HIVE-7172: -- Attachment: (was: HIVE-7172.patch) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion() - Key: HIVE-7172 URL: https://issues.apache.org/jira/browse/HIVE-7172 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7172.patch {code} ResultSet res = stmt.executeQuery(versionQuery); if (!res.next()) { throw new HiveMetaException(Didn't find version data in metastore); } String currentSchemaVersion = res.getString(1); metastoreConn.close(); {code} When HiveMetaException is thrown, metastoreConn.close() would be skipped. stmt is not closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7274) Update PTest2 to JClouds 1.7.3
[ https://issues.apache.org/jira/browse/HIVE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040511#comment-14040511 ] Hive QA commented on HIVE-7274: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651928/HIVE-7274.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5669 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/557/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/557/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-557/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651928 Update PTest2 to JClouds 1.7.3 -- Key: HIVE-7274 URL: https://issues.apache.org/jira/browse/HIVE-7274 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7274.patch Required to use newer instance types -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT
[ https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-2597: Status: Patch Available (was: Open) Repeated key in GROUP BY is erroneously displayed when using DISTINCT - Key: HIVE-2597 URL: https://issues.apache.org/jira/browse/HIVE-2597 Project: Hive Issue Type: Bug Reporter: Alex Rovner Assignee: Navis Attachments: HIVE-2597.3.patch.txt, HIVE-2597.D8967.1.patch, HIVE-2597.D8967.2.patch The following query was simplified for illustration purposes. This works correctly: select client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid The intent here is to produce two empty columns in between data. The following query does not work: select distinct client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY The key is not repeated since the aliases were given. Seems like Hive is ignoring the aliases when the distinct keyword is specified. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT
[ https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-2597: Attachment: HIVE-2597.3.patch.txt Repeated key in GROUP BY is erroneously displayed when using DISTINCT - Key: HIVE-2597 URL: https://issues.apache.org/jira/browse/HIVE-2597 Project: Hive Issue Type: Bug Reporter: Alex Rovner Assignee: Navis Attachments: HIVE-2597.3.patch.txt, HIVE-2597.D8967.1.patch, HIVE-2597.D8967.2.patch The following query was simplified for illustration purposes. This works correctly: select client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid The intent here is to produce two empty columns in between data. The following query does not work: select distinct client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY The key is not repeated since the aliases were given. Seems like Hive is ignoring the aliases when the distinct keyword is specified. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7273) Hive job fails due to unable to rename reducer output file
[ https://issues.apache.org/jira/browse/HIVE-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040537#comment-14040537 ] Navis commented on HIVE-7273: - The directory is expected to be created in ExecDriver before job submitting. Could you provide a query which can reproduce the situation? Fail of other tasks in the query also can make exceptions like above. Hive job fails due to unable to rename reducer output file -- Key: HIVE-7273 URL: https://issues.apache.org/jira/browse/HIVE-7273 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: George Wong We ran into this issue in our cluster. The error message is like this {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://***:8020/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_task_tmp.-ext-10002/_tmp.00_3 to: hdfs://***:8020/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_tmp.-ext-10002/00_3 at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:470) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:407) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) {noformat} The log of NameNode shows {noformat} 2014-06-16 20:43:38,582 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_task_tmp.-ext-10002/_tmp.00_3 to /tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_tmp.-ext-10002/00_3 because destination's parent does not exist {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
[ https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7051: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION - Key: HIVE-7051 URL: https://issues.apache.org/jira/browse/HIVE-7051 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-7051.1.patch Same as HIVE-7050 but for partitions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7271) Speed up unit tests
[ https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040550#comment-14040550 ] Hive QA commented on HIVE-7271: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651931/HIVE-7271.5.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5669 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dyn_part3 org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/558/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/558/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-558/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651931 Speed up unit tests --- Key: HIVE-7271 URL: https://issues.apache.org/jira/browse/HIVE-7271 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, HIVE-7271.4.patch, HIVE-7271.5.patch Did some experiments to see if there's a way to speed up unit tests. TestCliDriver seemed to take a lot of time just spinning up/tearing down JVMs. I was also curious to see if running everything on a ram disk would help. Results (I ran tests up to authorization_2): - Current setup: 40 minutes - Single JVM (not using child JVM to run all queries): 8 minutes - Single JVM + ram disk: 7 minutes So the ram disk didn't help that much. But running tests in single JVM seems worthwhile doing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7194) authorization_ctas.q failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040557#comment-14040557 ] Ashutosh Chauhan commented on HIVE-7194: +1 LGTM authorization_ctas.q failing on trunk - Key: HIVE-7194 URL: https://issues.apache.org/jira/browse/HIVE-7194 Project: Hive Issue Type: Task Components: Authorization Reporter: Ashutosh Chauhan Assignee: Thejas M Nair Attachments: HIVE-7194.1.patch.txt, HIVE-7194.patch Need to update .q.out file -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7265) BINARY columns use BytesWritable::getBytes() without ::getLength()
[ https://issues.apache.org/jira/browse/HIVE-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040560#comment-14040560 ] Ashutosh Chauhan commented on HIVE-7265: +1 BINARY columns use BytesWritable::getBytes() without ::getLength() -- Key: HIVE-7265 URL: https://issues.apache.org/jira/browse/HIVE-7265 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis Priority: Minor Attachments: HIVE-7265.1.patch.txt The Text conversion for BINARY columns does {code} case BINARY: t.set(((BinaryObjectInspector) inputOI).getPrimitiveWritableObject(input).getBytes()); return t; {code} This omission was noticed while investigating a different String related bug, in a list of functions which call getBytes() without calling getSize/getLength(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization
[ https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7205: --- Status: Open (was: Patch Available) Wrong results when union all of grouping followed by group by with correlation optimization --- Key: HIVE-7205 URL: https://issues.apache.org/jira/browse/HIVE-7205 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: dima machlin Assignee: Navis Priority: Critical Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt use case : table TBL (a string,b string) contains single row : 'a','a' the following query : {code:sql} select b, sum(cc) from ( select b,count(1) as cc from TBL group by b union all select a as b,count(1) as cc from TBL group by a ) z group by b {code} returns a 1 a 1 while set hive.optimize.correlation=true; if we change set hive.optimize.correlation=false; it returns correct results : a 2 The plan with correlation optimization : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: null-subquery1:z-subquery1:TBL TableScan alias: TBL Select Operator expressions: expr: b type: string outputColumnNames: b Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: b type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: bigint null-subquery2:z-subquery2:TBL TableScan alias: TBL Select Operator expressions: expr: a type: string outputColumnNames: a Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: a type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Demux Operator Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Union Select Operator expressions: expr: _col0 type: string expr: _col1
[jira] [Commented] (HIVE-7232) VectorReduceSink is emitting incorrect JOIN keys
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040565#comment-14040565 ] Ashutosh Chauhan commented on HIVE-7232: [~gopalv] Do you want to review this one? VectorReduceSink is emitting incorrect JOIN keys Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-7232-extra-logging.patch, HIVE-7232.1.patch.txt, q5.explain.txt, q5.sql After HIVE-7121, tpc-h query5 has resulted in incorrect results. Thanks to [~navis], it has been tracked down to the auto-parallel settings which were initialized for ReduceSinkOperator, but not for VectorReduceSinkOperator. The vector version inherits, but doesn't call super.initializeOp() or set up the variable correctly from ReduceSinkDesc. The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7237) hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever
[ https://issues.apache.org/jira/browse/HIVE-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040568#comment-14040568 ] Ashutosh Chauhan commented on HIVE-7237: +1 hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever - Key: HIVE-7237 URL: https://issues.apache.org/jira/browse/HIVE-7237 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.13.0 Environment: HDP 2.1, Hive 0.13, SLES 11, 128GB data nodes, ORC SNAPPY Reporter: Douglas Moore Assignee: Navis Attachments: HIVE-7237.1.patch.txt, HIVE-7237.2.patch.txt set hive.exec.parallel=true; will cause the Yarn application instance to linger forever. set hive.exec.parallel=false, the application goes away as soon as hive query is complete. The underlying table is an ORC store_sales table compressed with SNAPPY. {code} hive.exec.parallel=true; select * from store_sales where ss_ticket_number=5741230 and ss_item_sk=4825 {code} The query will run under Tez and finish 30 seconds. After 30-40 of these jobs the cluster gets to a point where no jobs will finish. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7235) TABLESAMPLE on join table is regarded as alias
[ https://issues.apache.org/jira/browse/HIVE-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040570#comment-14040570 ] Ashutosh Chauhan commented on HIVE-7235: [~rhbutani] Can you take a look at this one? TABLESAMPLE on join table is regarded as alias -- Key: HIVE-7235 URL: https://issues.apache.org/jira/browse/HIVE-7235 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7235.1.patch.txt {noformat} SELECT c_custkey, o_custkey FROM customer tablesample (1000 ROWS) join orders tablesample (1000 ROWS) on c_custkey = o_custkey; {noformat} Fails with NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table
[ https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040579#comment-14040579 ] Ashutosh Chauhan commented on HIVE-3392: I am fine with reopening this. [~appodictic] What do you think ? Hive unnecessarily validates table SerDes when dropping a table --- Key: HIVE-3392 URL: https://issues.apache.org/jira/browse/HIVE-3392 Project: Hive Issue Type: Bug Affects Versions: 0.9.0 Reporter: Jonathan Natkins Assignee: Navis Labels: patch Attachments: HIVE-3392.2.patch.txt, HIVE-3392.3.patch.txt, HIVE-3392.Test Case - with_trunk_version.txt natty@hadoop1:~$ hive hive add jar /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar; Added /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar hive create table test (a int) row format serde 'hive.serde.JSONSerDe'; OK Time taken: 2.399 seconds natty@hadoop1:~$ hive hive drop table test; FAILED: Hive Internal Error: java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe hive.serde.JSONSerDe does not exist)) java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe hive.serde.JSONSerDe does not exist) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe com.cloudera.hive.serde.JSONSerDe does not exist) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) ... 20 more hive add jar /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar; Added /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar hive drop table test; OK Time taken: 0.658 seconds hive -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7265) BINARY columns use BytesWritable::getBytes() without ::getLength()
[ https://issues.apache.org/jira/browse/HIVE-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7265: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! BINARY columns use BytesWritable::getBytes() without ::getLength() -- Key: HIVE-7265 URL: https://issues.apache.org/jira/browse/HIVE-7265 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7265.1.patch.txt The Text conversion for BINARY columns does {code} case BINARY: t.set(((BinaryObjectInspector) inputOI).getPrimitiveWritableObject(input).getBytes()); return t; {code} This omission was noticed while investigating a different String related bug, in a list of functions which call getBytes() without calling getSize/getLength(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-7094: - Resolution: Fixed Fix Version/s: 0.14.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks David! Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Fix For: 0.14.0 Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch, HIVE-7094.4.patch, HIVE-7094.5.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040686#comment-14040686 ] Hive QA commented on HIVE-7159: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651930/HIVE-7159.11.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5654 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/559/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/559/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-559/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651930 For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.10.patch, HIVE-7159.11.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, HIVE-7159.9.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: branch for cbo work
Hi Ashutosh, Thanks for your information. I do have some questions, but my concern is more on the design doc than branching. Nevertheless, I think it would be very helpful to clarify in the design before we actually put a lot of effort. From the design doc, it seems that the cost estimation is based on Tez, while the optimization occurs on logical layer. I'd think that CBO are valuable to either engine. If there is anything that's specific to a particular to an engine, then that optimization should stay at engine layer. My original comments was posted on HIVE-5775. Please let me know your thoughts. I'd also like to hear from the community. https://issues.apache.org/jira/browse/HIVE-5775?focusedCommentId=14039987page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039987 Thanks, Xuefu On Thu, Jun 19, 2014 at 10:34 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hi all, Some of you may have noticed that cost based optimizer work is going on at HIVE-5775 John has put up an initial patch there as well. But there is a lot more work that needs to be done. Following our tradition of large feature work in branch, I propose that we create a branch and commit this patch in it and than continue to work on it in branch to improve it. Hopefully, we can get it in shape so that we can merge it in trunk once its ready. Unless, I hear otherwise I plan to create a branch and commit this initial patch by early next week. Design doc is located here : https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive Thanks, Ashutosh
[jira] [Commented] (HIVE-7270) SerDe Properties are not considered by show create table Command
[ https://issues.apache.org/jira/browse/HIVE-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040784#comment-14040784 ] Hive QA commented on HIVE-7270: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651932/HIVE-7270.1.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5669 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/564/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/564/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-564/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651932 SerDe Properties are not considered by show create table Command Key: HIVE-7270 URL: https://issues.apache.org/jira/browse/HIVE-7270 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.1 Reporter: Renil J Assignee: Navis Priority: Minor Attachments: HIVE-7270.1.patch.txt The HIVE table DDl generated by show create table target_table command does not contain SerDe properties of the target table even though it contain specific SerDe properties. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()
[ https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040788#comment-14040788 ] Hive QA commented on HIVE-7229: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651937/HIVE-7229.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/566/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/566/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-566/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-566/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1604814. At revision 1604814. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12651937 String is compared using equal in HiveMetaStore#HMSHandler#init() - Key: HIVE-7229 URL: https://issues.apache.org/jira/browse/HIVE-7229 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7229.patch Around line 423: {code} if (partitionValidationRegex != null partitionValidationRegex != ) { partitionValidationPattern = Pattern.compile(partitionValidationRegex); {code} partitionValidationRegex.isEmpty() can be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
[ https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040786#comment-14040786 ] Hive QA commented on HIVE-7172: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651942/HIVE-7172.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/565/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/565/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-565/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-565/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/results/clientpositive/show_create_table_serde.q.out' Reverted 'ql/src/test/queries/clientpositive/show_create_table_serde.q' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1604814. At revision 1604814. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12651942 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion() - Key: HIVE-7172 URL: https://issues.apache.org/jira/browse/HIVE-7172 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7172.patch {code} ResultSet res = stmt.executeQuery(versionQuery); if (!res.next()) { throw new HiveMetaException(Didn't find version data in metastore); } String currentSchemaVersion = res.getString(1); metastoreConn.close(); {code} When HiveMetaException is thrown, metastoreConn.close() would be skipped. stmt is not closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table
[ https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040802#comment-14040802 ] Edward Capriolo commented on HIVE-3392: --- Please feel free to take over the review. I will not have any time at the moment. Thanks! Hive unnecessarily validates table SerDes when dropping a table --- Key: HIVE-3392 URL: https://issues.apache.org/jira/browse/HIVE-3392 Project: Hive Issue Type: Bug Affects Versions: 0.9.0 Reporter: Jonathan Natkins Assignee: Navis Labels: patch Attachments: HIVE-3392.2.patch.txt, HIVE-3392.3.patch.txt, HIVE-3392.Test Case - with_trunk_version.txt natty@hadoop1:~$ hive hive add jar /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar; Added /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar hive create table test (a int) row format serde 'hive.serde.JSONSerDe'; OK Time taken: 2.399 seconds natty@hadoop1:~$ hive hive drop table test; FAILED: Hive Internal Error: java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe hive.serde.JSONSerDe does not exist)) java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe hive.serde.JSONSerDe does not exist) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe com.cloudera.hive.serde.JSONSerDe does not exist) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) ... 20 more hive add jar /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar; Added /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar hive drop table test; OK Time taken: 0.658 seconds hive -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho
Congratulations guys! On Mon, Jun 23, 2014 at 2:09 AM, Lefty Leverenz leftylever...@gmail.com wrote: Bravo, Szehon and Gopal! -- Lefty On Mon, Jun 23, 2014 at 12:53 AM, Gopal V gop...@apache.org wrote: On 6/22/14, 8:42 PM, Carl Steinbach wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Thanks everyone! And congrats Szehon! Cheers, Gopal -- Swarnim
Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho
Thank you all very much, and congrats Gopal! Szehon On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Please join me in congratulating Gopal and Szehon! Thanks. - Carl
[jira] [Commented] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT
[ https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040900#comment-14040900 ] Hive QA commented on HIVE-2597: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651945/HIVE-2597.3.patch.txt {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 5670 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby6 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join3 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join6 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join7 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join8 org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/568/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/568/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-568/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651945 Repeated key in GROUP BY is erroneously displayed when using DISTINCT - Key: HIVE-2597 URL: https://issues.apache.org/jira/browse/HIVE-2597 Project: Hive Issue Type: Bug Reporter: Alex Rovner Assignee: Navis Attachments: HIVE-2597.3.patch.txt, HIVE-2597.D8967.1.patch, HIVE-2597.D8967.2.patch The following query was simplified for illustration purposes. This works correctly: select client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid The intent here is to produce two empty columns in between data. The following query does not work: select distinct client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY The key is not repeated since the aliases were given. Seems like Hive is ignoring the aliases when the distinct keyword is specified. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7118) Oracle upgrade schema scripts do not map Java long datatype columns correctly for transaction related tables
[ https://issues.apache.org/jira/browse/HIVE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040929#comment-14040929 ] Alan Gates commented on HIVE-7118: -- +1 Oracle upgrade schema scripts do not map Java long datatype columns correctly for transaction related tables Key: HIVE-7118 URL: https://issues.apache.org/jira/browse/HIVE-7118 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Environment: Oracle DB Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 0.14.0 Attachments: HIVE-7118-0.13.0.1.patch, HIVE-7118.1.patch In Transaction related tables, Java long column fields are mapped to NUMBER(10) which results in failure to persist the transaction ids which are incompatible. Following error is seen: {noformat} ORA-01438: value larger than specified precision allowed for this column {noformat} NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho
Congrats Gopal and Szehon! --Vaibhav On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote: Thank you all very much, and congrats Gopal! Szehon On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Please join me in congratulating Gopal and Szehon! Thanks. - Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho
Congrats! On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Congrats Gopal and Szehon! --Vaibhav On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote: Thank you all very much, and congrats Gopal! Szehon On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Please join me in congratulating Gopal and Szehon! Thanks. - Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive
[ https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040970#comment-14040970 ] Laljo John Pullokkaran commented on HIVE-5775: -- The cost model as described in the doc assumes TEZ as the execution layer. Introduce Cost Based Optimizer to Hive -- Key: HIVE-5775 URL: https://issues.apache.org/jira/browse/HIVE-5775 Project: Hive Issue Type: New Feature Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: CBO-2.pdf, HIVE-5775.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive
[ https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040986#comment-14040986 ] Gopal V commented on HIVE-5775: --- [~xuefuz]: The CBO model rewrites queries using cardinality statistics. The tuple count and distinct value count should not affect which physical layer it runs on - having the CBO split up/reorder a 3-way map-join into 2 phases (or vertices) should generate identical plans in both. MR would run 2 Map-only phases with their own local tasks and hashtable uploads, Tez would run 2 vertices with their own broadcast tasks. Tez can reduce runtimes further by removing the intermediate IO cost co-schedule the second vertex in the same container as the first - but that is not assumed as it is not a strong guarantee in a busy cluster. The Tez runtime model is faster, but the logical cost does not change as the number of rows read off disk, written to disk and distinct keys remain the same. In fact as it exists today, because it applies equally to both Tez MR, it ignores a lot of Tez's opportunistic/runtime optimizations like container-reuse - e.g. Each vertex in Tez is a different process. It is upto the Tez DAG planner to attend to such runtime optimization details. Introduce Cost Based Optimizer to Hive -- Key: HIVE-5775 URL: https://issues.apache.org/jira/browse/HIVE-5775 Project: Hive Issue Type: New Feature Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: CBO-2.pdf, HIVE-5775.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho
congrats to Gopal and Szehon! Thanks Hari On Mon, Jun 23, 2014 at 9:59 AM, Xiaobing Zhou xz...@hortonworks.com wrote: Congrats! On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Congrats Gopal and Szehon! --Vaibhav On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote: Thank you all very much, and congrats Gopal! Szehon On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Please join me in congratulating Gopal and Szehon! Thanks. - Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho
Congrats! On Jun 23, 2014, at 10:28 AM, Hari Subramaniyan hsubramani...@hortonworks.com wrote: congrats to Gopal and Szehon! Thanks Hari On Mon, Jun 23, 2014 at 9:59 AM, Xiaobing Zhou xz...@hortonworks.com wrote: Congrats! On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Congrats Gopal and Szehon! --Vaibhav On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote: Thank you all very much, and congrats Gopal! Szehon On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Please join me in congratulating Gopal and Szehon! Thanks. - Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho
Congrats Gopal and Szehon! On Mon, Jun 23, 2014 at 10:34 AM, Jason Dere jd...@hortonworks.com wrote: Congrats! On Jun 23, 2014, at 10:28 AM, Hari Subramaniyan hsubramani...@hortonworks.com wrote: congrats to Gopal and Szehon! Thanks Hari On Mon, Jun 23, 2014 at 9:59 AM, Xiaobing Zhou xz...@hortonworks.com wrote: Congrats! On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Congrats Gopal and Szehon! --Vaibhav On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote: Thank you all very much, and congrats Gopal! Szehon On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho committers on the Apache Hive Project. Please join me in congratulating Gopal and Szehon! Thanks. - Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: branch for cbo work
Following may help in reducing the confusion: 1. In design doc the cost formula is for choosing Join Algorithm. The cost formula as described in the doc assumes Tez execution. 2. However current work on CBO doesn’t include Join algorithm selection. Instead it rearranges Join based on Join cardinality NDV. In other words Join reordering is not depended on Physical Execution Layer (Tez or MR). 3. When we decide to do Join Algorithm Selection we can fit in cost formula for both a) MR b) Tez. This way, based on the physical execution layer we can select best Join Algorithm/Order. 4. The cost formula for Join Algorithm selection is not that different between MR Tez (except for intermediate HDFS writes). So assume that CBO can support both execution layers rather easily. 5. CBO framework allows you to plug and play any cost model. There is no hard coupling. Thanks John On Mon, Jun 23, 2014 at 7:09 AM, Xuefu Zhang xzh...@cloudera.com wrote: Hi Ashutosh, Thanks for your information. I do have some questions, but my concern is more on the design doc than branching. Nevertheless, I think it would be very helpful to clarify in the design before we actually put a lot of effort. From the design doc, it seems that the cost estimation is based on Tez, while the optimization occurs on logical layer. I'd think that CBO are valuable to either engine. If there is anything that's specific to a particular to an engine, then that optimization should stay at engine layer. My original comments was posted on HIVE-5775. Please let me know your thoughts. I'd also like to hear from the community. https://issues.apache.org/jira/browse/HIVE-5775?focusedCommentId=14039987page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039987 Thanks, Xuefu On Thu, Jun 19, 2014 at 10:34 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hi all, Some of you may have noticed that cost based optimizer work is going on at HIVE-5775 John has put up an initial patch there as well. But there is a lot more work that needs to be done. Following our tradition of large feature work in branch, I propose that we create a branch and commit this patch in it and than continue to work on it in branch to improve it. Hopefully, we can get it in shape so that we can merge it in trunk once its ready. Unless, I hear otherwise I plan to create a branch and commit this initial patch by early next week. Design doc is located here : https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive Thanks, Ashutosh -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive
[ https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041053#comment-14041053 ] Laljo John Pullokkaran commented on HIVE-5775: -- Following may help in reducing the confusion: 1. In design doc the cost formula is for choosing Join Algorithm. The cost formula as described in the doc assumes Tez execution. 2. However current work on CBO doesn’t include Join algorithm selection. Instead it rearranges Join based on Join cardinality NDV. In other words Join reordering is not depended on Physical Execution Layer (Tez or MR). 3. When we decide to do Join Algorithm Selection we can fit in cost formula for both a) MR b) Tez. This way, based on the physical execution layer we can select best Join Algorithm/Order. 4. The cost formula for Join Algorithm selection is not that different between MR Tez (except for intermediate HDFS writes). So assume that CBO can support both execution layers rather easily. 5. CBO framework allows you to plug and play any cost model. There is no hard coupling. Introduce Cost Based Optimizer to Hive -- Key: HIVE-5775 URL: https://issues.apache.org/jira/browse/HIVE-5775 Project: Hive Issue Type: New Feature Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: CBO-2.pdf, HIVE-5775.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive
[ https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041056#comment-14041056 ] Xuefu Zhang commented on HIVE-5775: --- Thanks for the clarification, [~gopalv]. We are in total agreement if what is put in the logical layer is the optimization that's applicable to either execution engine and if execution engine specific optimization is put in the execution layer. Maybe the document can be updated to make this explicit to avoid confusion/misunderstanding from others. {quote} The cost model as described in the doc assumes TEZ as the execution layer. {quote} Not sure if I understand [~jpullokkaran] correctly. If the cost model is based on Tez, then we shall only use a model that's common for both Tez and MR when rewriting the query, right? Introduce Cost Based Optimizer to Hive -- Key: HIVE-5775 URL: https://issues.apache.org/jira/browse/HIVE-5775 Project: Hive Issue Type: New Feature Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: CBO-2.pdf, HIVE-5775.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7241) Wrong lock acquired for alter table rename partition
[ https://issues.apache.org/jira/browse/HIVE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041061#comment-14041061 ] Alan Gates commented on HIVE-7241: -- root_dir_external_table and authorization_ctas fail on trunk. The other two pass in my local tests on both trunk and with my patch, so I do not believe any of these are related to the patch. Wrong lock acquired for alter table rename partition Key: HIVE-7241 URL: https://issues.apache.org/jira/browse/HIVE-7241 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7241.patch, HIVE-7241.patch Doing an alter table foo partition (bar='x') rename to partition (bar='y') acquires a read lock on table foo. It should instead acquire an exclusive lock on partition bar=x. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7274) Update PTest2 to JClouds 1.7.3
[ https://issues.apache.org/jira/browse/HIVE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041075#comment-14041075 ] Szehon Ho commented on HIVE-7274: - Thanks for researching that. +1 Update PTest2 to JClouds 1.7.3 -- Key: HIVE-7274 URL: https://issues.apache.org/jira/browse/HIVE-7274 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7274.patch Required to use newer instance types -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive
[ https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041085#comment-14041085 ] Laljo John Pullokkaran commented on HIVE-5775: -- Cost Model described doesn't apply to current CBO work and for the proposed branch. It will apply only for Join Algorithm selection which is not part of the current work. IMO moving join reordering to physical optimizer is the not the correct solution. I would rather leave it in logical, since after doing join reordering you may able to do other optimizations like, new predicate push down, transitive inferences…. When we get around to do Join Algorithm selection there will be two cost formulas one for MR and one for Tez. I think best solution is to support both cost models and decide which one to apply based on physical execution layer. I will update the doc. Introduce Cost Based Optimizer to Hive -- Key: HIVE-5775 URL: https://issues.apache.org/jira/browse/HIVE-5775 Project: Hive Issue Type: New Feature Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: CBO-2.pdf, HIVE-5775.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: branch for cbo work
Thanks for the clarification. I'm happily on board with this as long as our approach takes account of the differences between execution engines. While MR and Tez might be similar, there could be new execution engines in the future which might not be that similar. Ideally, all execution engines should benefit from this effort yet room is kept to allow for specific optimizations for a particular engine. It's great if we all see that. Thanks, Xuefu On Mon, Jun 23, 2014 at 11:05 AM, John Pullokkaran jpullokka...@hortonworks.com wrote: Following may help in reducing the confusion: 1. In design doc the cost formula is for choosing Join Algorithm. The cost formula as described in the doc assumes Tez execution. 2. However current work on CBO doesn’t include Join algorithm selection. Instead it rearranges Join based on Join cardinality NDV. In other words Join reordering is not depended on Physical Execution Layer (Tez or MR). 3. When we decide to do Join Algorithm Selection we can fit in cost formula for both a) MR b) Tez. This way, based on the physical execution layer we can select best Join Algorithm/Order. 4. The cost formula for Join Algorithm selection is not that different between MR Tez (except for intermediate HDFS writes). So assume that CBO can support both execution layers rather easily. 5. CBO framework allows you to plug and play any cost model. There is no hard coupling. Thanks John On Mon, Jun 23, 2014 at 7:09 AM, Xuefu Zhang xzh...@cloudera.com wrote: Hi Ashutosh, Thanks for your information. I do have some questions, but my concern is more on the design doc than branching. Nevertheless, I think it would be very helpful to clarify in the design before we actually put a lot of effort. From the design doc, it seems that the cost estimation is based on Tez, while the optimization occurs on logical layer. I'd think that CBO are valuable to either engine. If there is anything that's specific to a particular to an engine, then that optimization should stay at engine layer. My original comments was posted on HIVE-5775. Please let me know your thoughts. I'd also like to hear from the community. https://issues.apache.org/jira/browse/HIVE-5775?focusedCommentId=14039987page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039987 Thanks, Xuefu On Thu, Jun 19, 2014 at 10:34 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hi all, Some of you may have noticed that cost based optimizer work is going on at HIVE-5775 John has put up an initial patch there as well. But there is a lot more work that needs to be done. Following our tradition of large feature work in branch, I propose that we create a branch and commit this patch in it and than continue to work on it in branch to improve it. Hopefully, we can get it in shape so that we can merge it in trunk once its ready. Unless, I hear otherwise I plan to create a branch and commit this initial patch by early next week. Design doc is located here : https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive Thanks, Ashutosh -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-7242) alter table drop partition is acquiring the wrong type of lock
[ https://issues.apache.org/jira/browse/HIVE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041101#comment-14041101 ] Alan Gates commented on HIVE-7242: -- dynpart_sort_optimization passes on both trunk and with the patch when I run it. root_dir_external_table and authorization_ctas are broken on trunk right now. So I don't believe these test failures are related to this patch. alter table drop partition is acquiring the wrong type of lock -- Key: HIVE-7242 URL: https://issues.apache.org/jira/browse/HIVE-7242 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.14.0 Attachments: HIVE-7242.patch Doing an alter table foo drop partition ('bar=x') acquired a shared-write lock on partition bar=x. It should be acquiring an exclusive lock in that case. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: branch for cbo work
I see that design doc doesn't talk about plug and play aspect of cost model; and it also doesn't make it clear that cost model described is for Join Algorithm selection; also it doesn't have cost model for MR. I will update the doc appropriately. Thanks John -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-7246) Hive transaction manager hardwires bonecp as the JDBC pooling implementation
[ https://issues.apache.org/jira/browse/HIVE-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7246: - Status: Open (was: Patch Available) Patch no longer applies after recent checkins. Hive transaction manager hardwires bonecp as the JDBC pooling implementation Key: HIVE-7246 URL: https://issues.apache.org/jira/browse/HIVE-7246 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7246.patch Currently TxnManager hardwires BoneCP as the JDBC connection pooling implementation. Instead it should use the same connection pooling that the metastore does. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive
[ https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041115#comment-14041115 ] Xuefu Zhang commented on HIVE-5775: --- Cool. Thanks for the clarifications. Introduce Cost Based Optimizer to Hive -- Key: HIVE-5775 URL: https://issues.apache.org/jira/browse/HIVE-5775 Project: Hive Issue Type: New Feature Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: CBO-2.pdf, HIVE-5775.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7246) Hive transaction manager hardwires bonecp as the JDBC pooling implementation
[ https://issues.apache.org/jira/browse/HIVE-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7246: - Status: Patch Available (was: Open) Hive transaction manager hardwires bonecp as the JDBC pooling implementation Key: HIVE-7246 URL: https://issues.apache.org/jira/browse/HIVE-7246 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7246-2.patch, HIVE-7246.patch Currently TxnManager hardwires BoneCP as the JDBC connection pooling implementation. Instead it should use the same connection pooling that the metastore does. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7246) Hive transaction manager hardwires bonecp as the JDBC pooling implementation
[ https://issues.apache.org/jira/browse/HIVE-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7246: - Attachment: HIVE-7246-2.patch Rebased patch. Hive transaction manager hardwires bonecp as the JDBC pooling implementation Key: HIVE-7246 URL: https://issues.apache.org/jira/browse/HIVE-7246 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7246-2.patch, HIVE-7246.patch Currently TxnManager hardwires BoneCP as the JDBC connection pooling implementation. Instead it should use the same connection pooling that the metastore does. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()
[ https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041126#comment-14041126 ] Xuefu Zhang commented on HIVE-7229: --- [~HS] Thanks for working on this. It seemed that your patch didn't apply to trunk somehow. Could you check/rebase if necessary? On a side note, the following is equivalent: {code} partitionValidationRegex != null !partitionValidationRegex.equals() {code} {code} .equals(partitionValidationRegex) {code} String is compared using equal in HiveMetaStore#HMSHandler#init() - Key: HIVE-7229 URL: https://issues.apache.org/jira/browse/HIVE-7229 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7229.patch Around line 423: {code} if (partitionValidationRegex != null partitionValidationRegex != ) { partitionValidationPattern = Pattern.compile(partitionValidationRegex); {code} partitionValidationRegex.isEmpty() can be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()
[ https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041132#comment-14041132 ] Xuefu Zhang commented on HIVE-7229: --- Never mind about above code snippet. String is compared using equal in HiveMetaStore#HMSHandler#init() - Key: HIVE-7229 URL: https://issues.apache.org/jira/browse/HIVE-7229 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7229.patch Around line 423: {code} if (partitionValidationRegex != null partitionValidationRegex != ) { partitionValidationPattern = Pattern.compile(partitionValidationRegex); {code} partitionValidationRegex.isEmpty() can be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7249) HiveTxnManager.closeTxnManger() throws if called after commitTxn()
[ https://issues.apache.org/jira/browse/HIVE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041135#comment-14041135 ] Alan Gates commented on HIVE-7249: -- TestOrcDynamicPartitioned runs fine in my tests. But I ran it as is. Did you turn on the DbTxnManager and then run the test, or run it as is? HiveTxnManager.closeTxnManger() throws if called after commitTxn() -- Key: HIVE-7249 URL: https://issues.apache.org/jira/browse/HIVE-7249 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Alan Gates Attachments: HIVE-7249.patch I openTxn() and acquireLocks() for a query that looks like INSERT INTO T PARTITION(p) SELECT * FROM T. Then I call commitTxn(). Then I call closeTxnManger() I get an exception saying lock not found (the only lock in this txn). So it seems TxnMgr doesn't know that commit released the locks. Here is the stack trace and some log output which maybe useful: {noformat} 2014-06-17 15:54:40,771 DEBUG mapreduce.TransactionContext (TransactionContext.java:onCommitJob(128)) - onCommitJob(job_local557130041_0001). this=46719652 2014-06-17 15:54:40,771 DEBUG lockmgr.DbTxnManager (DbTxnManager.java:commitTxn(205)) - Committing txn 1 2014-06-17 15:54:40,771 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) - Going to execute query values current_timestamp 2014-06-17 15:54:40,772 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1423)) - Going to execute query select txn_state from TXNS where txn_id = 1 for\ update 2014-06-17 15:54:40,773 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1438)) - Going to execute update update TXNS set txn_last_heartbeat = 140304568\ 0772 where txn_id = 1 2014-06-17 15:54:40,778 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1440)) - Going to commit 2014-06-17 15:54:40,779 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(344)) - Going to execute insert insert into COMPLETED_TXN_COMPONENTS select tc_txn\ id, tc_database, tc_table, tc_partition from TXN_COMPONENTS where tc_txnid = 1 2014-06-17 15:54:40,784 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(352)) - Going to execute update delete from TXN_COMPONENTS where tc_txnid = 1 2014-06-17 15:54:40,788 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(356)) - Going to execute update delete from HIVE_LOCKS where hl_txnid = 1 2014-06-17 15:54:40,791 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(359)) - Going to execute update delete from TXNS where txn_id = 1 2014-06-17 15:54:40,794 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(361)) - Going to commit 2014-06-17 15:54:40,795 WARN mapreduce.TransactionContext (TransactionContext.java:cleanup(317)) - cleanupJob(JobID=job_local557130041_0001)this=46719652 2014-06-17 15:54:40,795 DEBUG lockmgr.DbLockManager (DbLockManager.java:unlock(109)) - Unlocking id:1 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) - Going to execute query values current_timestamp 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatLock(1402)) - Going to execute update update HIVE_LOCKS set hl_last_heartbeat = 140\ 3045680796 where hl_lock_ext_id = 1 2014-06-17 15:54:40,800 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatLock(1405)) - Going to rollback 2014-06-17 15:54:40,804 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - NoSuchLockException(message:No such lock: 1) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1407) at org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:477) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:4817) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy14.unlock(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1598) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:110) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.close(DbLockManager.java:162) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.destruct(DbTxnManager.java:300) at org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.closeTxnManager(HiveTxnManagerImpl.java:39)
[jira] [Updated] (HIVE-7090) Support session-level temporary tables in Hive
[ https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7090: - Attachment: HIVE-7090.2.patch Patch v2 moves the management of the temp tables completely to the client side. So changes are to HiveMetaStoreClient, rather than at the ObjectStore. Still needs more testing. Support session-level temporary tables in Hive -- Key: HIVE-7090 URL: https://issues.apache.org/jira/browse/HIVE-7090 Project: Hive Issue Type: Bug Components: SQL Reporter: Gunther Hagleitner Assignee: Harish Butani Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch It's common to see sql scripts that create some temporary table as an intermediate result, run some additional queries against it and then clean up at the end. We should support temporary tables properly, meaning automatically manage the life cycle and make sure the visibility is restricted to the creating connection/session. Without these it's common to see left over tables in meta-store or weird errors with clashing tmp table names. Proposed syntax: CREATE TEMPORARY TABLE CTAS, CTL, INSERT INTO, should all be supported as usual. Knowing that a user wants a temp table can enable us to further optimize access to it. E.g.: temp tables should be kept in memory where possible, compactions and merging table files aren't required, ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7249) HiveTxnManager.closeTxnManger() throws if called after commitTxn()
[ https://issues.apache.org/jira/browse/HIVE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041158#comment-14041158 ] Eugene Koifman commented on HIVE-7249: -- yes, i did turn on DbTxnManager, but since we are creating a HCat specific API, let me retest it once that is ready HiveTxnManager.closeTxnManger() throws if called after commitTxn() -- Key: HIVE-7249 URL: https://issues.apache.org/jira/browse/HIVE-7249 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Alan Gates Attachments: HIVE-7249.patch I openTxn() and acquireLocks() for a query that looks like INSERT INTO T PARTITION(p) SELECT * FROM T. Then I call commitTxn(). Then I call closeTxnManger() I get an exception saying lock not found (the only lock in this txn). So it seems TxnMgr doesn't know that commit released the locks. Here is the stack trace and some log output which maybe useful: {noformat} 2014-06-17 15:54:40,771 DEBUG mapreduce.TransactionContext (TransactionContext.java:onCommitJob(128)) - onCommitJob(job_local557130041_0001). this=46719652 2014-06-17 15:54:40,771 DEBUG lockmgr.DbTxnManager (DbTxnManager.java:commitTxn(205)) - Committing txn 1 2014-06-17 15:54:40,771 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) - Going to execute query values current_timestamp 2014-06-17 15:54:40,772 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1423)) - Going to execute query select txn_state from TXNS where txn_id = 1 for\ update 2014-06-17 15:54:40,773 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1438)) - Going to execute update update TXNS set txn_last_heartbeat = 140304568\ 0772 where txn_id = 1 2014-06-17 15:54:40,778 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1440)) - Going to commit 2014-06-17 15:54:40,779 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(344)) - Going to execute insert insert into COMPLETED_TXN_COMPONENTS select tc_txn\ id, tc_database, tc_table, tc_partition from TXN_COMPONENTS where tc_txnid = 1 2014-06-17 15:54:40,784 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(352)) - Going to execute update delete from TXN_COMPONENTS where tc_txnid = 1 2014-06-17 15:54:40,788 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(356)) - Going to execute update delete from HIVE_LOCKS where hl_txnid = 1 2014-06-17 15:54:40,791 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(359)) - Going to execute update delete from TXNS where txn_id = 1 2014-06-17 15:54:40,794 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(361)) - Going to commit 2014-06-17 15:54:40,795 WARN mapreduce.TransactionContext (TransactionContext.java:cleanup(317)) - cleanupJob(JobID=job_local557130041_0001)this=46719652 2014-06-17 15:54:40,795 DEBUG lockmgr.DbLockManager (DbLockManager.java:unlock(109)) - Unlocking id:1 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) - Going to execute query values current_timestamp 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatLock(1402)) - Going to execute update update HIVE_LOCKS set hl_last_heartbeat = 140\ 3045680796 where hl_lock_ext_id = 1 2014-06-17 15:54:40,800 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatLock(1405)) - Going to rollback 2014-06-17 15:54:40,804 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - NoSuchLockException(message:No such lock: 1) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1407) at org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:477) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:4817) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy14.unlock(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1598) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:110) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.close(DbLockManager.java:162) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.destruct(DbTxnManager.java:300) at org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.closeTxnManager(HiveTxnManagerImpl.java:39) at
[jira] [Commented] (HIVE-7235) TABLESAMPLE on join table is regarded as alias
[ https://issues.apache.org/jira/browse/HIVE-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041183#comment-14041183 ] Harish Butani commented on HIVE-7235: - +1 lgtm TABLESAMPLE on join table is regarded as alias -- Key: HIVE-7235 URL: https://issues.apache.org/jira/browse/HIVE-7235 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7235.1.patch.txt {noformat} SELECT c_custkey, o_custkey FROM customer tablesample (1000 ROWS) join orders tablesample (1000 ROWS) on c_custkey = o_custkey; {noformat} Fails with NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041188#comment-14041188 ] David Chen commented on HIVE-7094: -- Thanks, Carl and Sushanth! Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Fix For: 0.14.0 Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch, HIVE-7094.4.patch, HIVE-7094.5.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive
[ https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041250#comment-14041250 ] Eugene Koifman commented on HIVE-7090: -- If the client fails, how does the temp table get cleaned up? Support session-level temporary tables in Hive -- Key: HIVE-7090 URL: https://issues.apache.org/jira/browse/HIVE-7090 Project: Hive Issue Type: Bug Components: SQL Reporter: Gunther Hagleitner Assignee: Harish Butani Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch It's common to see sql scripts that create some temporary table as an intermediate result, run some additional queries against it and then clean up at the end. We should support temporary tables properly, meaning automatically manage the life cycle and make sure the visibility is restricted to the creating connection/session. Without these it's common to see left over tables in meta-store or weird errors with clashing tmp table names. Proposed syntax: CREATE TEMPORARY TABLE CTAS, CTL, INSERT INTO, should all be supported as usual. Knowing that a user wants a temp table can enable us to further optimize access to it. E.g.: temp tables should be kept in memory where possible, compactions and merging table files aren't required, ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041282#comment-14041282 ] Gunther Hagleitner commented on HIVE-7159: -- Remaining failures are unrelated. For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.10.patch, HIVE-7159.11.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, HIVE-7159.9.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6207) Integrate HCatalog with locking
[ https://issues.apache.org/jira/browse/HIVE-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6207: - Attachment: ACIDHCatalogDesign.pdf Integrate HCatalog with locking --- Key: HIVE-6207 URL: https://issues.apache.org/jira/browse/HIVE-6207 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: ACIDHCatalogDesign.pdf, HIVE-6207.4.patch HCatalog currently ignores any locks created by Hive users. It should respect the locks Hive creates as well as create locks itself when locking is configured. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive
[ https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041304#comment-14041304 ] Jason Dere commented on HIVE-7090: -- The temp table scratch directory is deleted during session close, and also marked for deletion upon process close, which should clean up the directory for normal usage. If the client dies, this cleanup does not occur and the directory is left in the user's scratch directory. For HiveServer2, we could try to add cleanup thread to remove old temp table directories from the scratch directory. For other users like HiveCLI, there would probably not be any automated cleanup, similar to other stuff that could get left around in the user's scratch directory. Support session-level temporary tables in Hive -- Key: HIVE-7090 URL: https://issues.apache.org/jira/browse/HIVE-7090 Project: Hive Issue Type: Bug Components: SQL Reporter: Gunther Hagleitner Assignee: Harish Butani Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch It's common to see sql scripts that create some temporary table as an intermediate result, run some additional queries against it and then clean up at the end. We should support temporary tables properly, meaning automatically manage the life cycle and make sure the visibility is restricted to the creating connection/session. Without these it's common to see left over tables in meta-store or weird errors with clashing tmp table names. Proposed syntax: CREATE TEMPORARY TABLE CTAS, CTL, INSERT INTO, should all be supported as usual. Knowing that a user wants a temp table can enable us to further optimize access to it. E.g.: temp tables should be kept in memory where possible, compactions and merging table files aren't required, ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7225) Unclosed Statement's in TxnHandler
[ https://issues.apache.org/jira/browse/HIVE-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041306#comment-14041306 ] Alan Gates commented on HIVE-7225: -- +1 Unclosed Statement's in TxnHandler -- Key: HIVE-7225 URL: https://issues.apache.org/jira/browse/HIVE-7225 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: steve, Oh Attachments: HIVE-7225.1.patch, hive-7225.3.patch There're several methods in TxnHandler where Statement (local to the method) is not closed upon return. Here're a few examples: In compact(): {code} stmt.executeUpdate(s); LOG.debug(Going to commit); dbConn.commit(); {code} In showCompact(): {code} Statement stmt = dbConn.createStatement(); String s = select cq_database, cq_table, cq_partition, cq_state, cq_type, cq_worker_id, + cq_start, cq_run_as from COMPACTION_QUEUE; LOG.debug(Going to execute query + s + ); ResultSet rs = stmt.executeQuery(s); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~rhbutani]! For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-7159.1.patch, HIVE-7159.10.patch, HIVE-7159.11.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, HIVE-7159.9.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6469) skipTrash option in hive command line
[ https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041372#comment-14041372 ] Ravi Prakash commented on HIVE-6469: Can folks watching this JIRA please review HIVE-7100 which now has a patch? Would that be an acceptable option instead of this? skipTrash option in hive command line - Key: HIVE-6469 URL: https://issues.apache.org/jira/browse/HIVE-6469 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.12.0 Reporter: Jayesh Assignee: Jayesh Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6469.1.patch, HIVE-6469.2.patch, HIVE-6469.3.patch, HIVE-6469.patch Th current behavior of hive metastore during a drop table table_name command is to delete the data from HDFS warehouse and put it into Trash. Currently there is no way to provide a flag to tell the warehouse to skip trash while deleting table data. This ticket is to add skipTrash configuration hive.warehouse.data.skipTrash , which when set to true, will skipTrash while dropping table data from hdfs warehouse. This will be set to false by default to keep current behavior. This would be good feature to add, so that an admin of the cluster can specify when not to put data into the trash directory (eg. in a dev environment) and thus not to fill hdfs space instead of relying on trash interval and policy configuration to take care of disk filling issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive
[ https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041378#comment-14041378 ] Eugene Koifman commented on HIVE-7090: -- In that case it may make sense to generate unique names for artifacts that may be left over. The initial description in this ticket mentions 3rd party tools that will use this feature - I imagine they will generate the same Temp table name each time which may cause weird failures after crash. Support session-level temporary tables in Hive -- Key: HIVE-7090 URL: https://issues.apache.org/jira/browse/HIVE-7090 Project: Hive Issue Type: Bug Components: SQL Reporter: Gunther Hagleitner Assignee: Harish Butani Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch It's common to see sql scripts that create some temporary table as an intermediate result, run some additional queries against it and then clean up at the end. We should support temporary tables properly, meaning automatically manage the life cycle and make sure the visibility is restricted to the creating connection/session. Without these it's common to see left over tables in meta-store or weird errors with clashing tmp table names. Proposed syntax: CREATE TEMPORARY TABLE CTAS, CTL, INSERT INTO, should all be supported as usual. Knowing that a user wants a temp table can enable us to further optimize access to it. E.g.: temp tables should be kept in memory where possible, compactions and merging table files aren't required, ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7275) optimize these functions for windowing function.
Kiet Ly created HIVE-7275: - Summary: optimize these functions for windowing function. Key: HIVE-7275 URL: https://issues.apache.org/jira/browse/HIVE-7275 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.12.0, 0.11.0 Environment: Hadoop 2.4.0, Hive 13.0 Reporter: Kiet Ly Please apply the window streaming optimization from issue HIVE-7143/7062 to these functions if they are applicable. row_number count rank dense_rank nvl rank dense_rank nvl cast decode median stddev coalesce floor sign abs ltrim substring to_char -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive
[ https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041399#comment-14041399 ] Jason Dere commented on HIVE-7090: -- Yes good point. The patch actually does this - each session will have its own scratch directory for temp tables, using the session ID (a UUID). Within the session's temp table scratch directory, each created temp table will get its own directory, also generated using UUID. Support session-level temporary tables in Hive -- Key: HIVE-7090 URL: https://issues.apache.org/jira/browse/HIVE-7090 Project: Hive Issue Type: Bug Components: SQL Reporter: Gunther Hagleitner Assignee: Harish Butani Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch It's common to see sql scripts that create some temporary table as an intermediate result, run some additional queries against it and then clean up at the end. We should support temporary tables properly, meaning automatically manage the life cycle and make sure the visibility is restricted to the creating connection/session. Without these it's common to see left over tables in meta-store or weird errors with clashing tmp table names. Proposed syntax: CREATE TEMPORARY TABLE CTAS, CTL, INSERT INTO, should all be supported as usual. Knowing that a user wants a temp table can enable us to further optimize access to it. E.g.: temp tables should be kept in memory where possible, compactions and merging table files aren't required, ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7266) Optimized HashTable with vectorized map-joins results in String columns extending
[ https://issues.apache.org/jira/browse/HIVE-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7266: -- Assignee: Matt McCline (was: Jitendra Nath Pandey) Optimized HashTable with vectorized map-joins results in String columns extending - Key: HIVE-7266 URL: https://issues.apache.org/jira/browse/HIVE-7266 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Matt McCline Attachments: hive-7266-small-test.tgz The following query returns different results when both vectorized mapjoin and the new optimized hashtable are enabled. {code} hive set hive.vectorized.execution.enabled=false; hive select s_suppkey, n_name from supplier, nation where s_nationkey = n_nationkey limit 25; ... 316869 JAPAN 1636869 RUSSIA 1096869 IRAN 7236869 RUSSIA 2276869 INDIA 8516869 ARGENTINA 2636869 MOZAMBIQUE 3836869 ROMANIA 2616869 FRANCE {code} But when vectorization is enabled, the results are {code} 316869 JAPAN 1636869 RUSSIA 1096869 IRANIA 7236869 RUSSIA 2276869 INDIAA 8516869 ARGENTINA 2636869 MOZAMBIQUE 3836869 ROMANIAQUE 2616869 FRANCEAQUE {code} it works correctly with vectorization when the new optimized map-join hashtable is disabled {code} hive set hive.vectorized.execution.enabled=true; hive set hive.mapjoin.optimized.hashtable=false; hive select s_suppkey, n_name from supplier, nation where s_nationkey = n_nationkey limit 25; 316869 JAPAN 1636869 RUSSIA 1096869 IRAN 7236869 RUSSIA 2276869 INDIA 8516869 ARGENTINA 2636869 MOZAMBIQUE 3836869 ROMANIA 2616869 FRANCE {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7090) Support session-level temporary tables in Hive
[ https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7090: - Attachment: HIVE-7090.3.patch rebase with trunk Support session-level temporary tables in Hive -- Key: HIVE-7090 URL: https://issues.apache.org/jira/browse/HIVE-7090 Project: Hive Issue Type: Bug Components: SQL Reporter: Gunther Hagleitner Assignee: Harish Butani Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch It's common to see sql scripts that create some temporary table as an intermediate result, run some additional queries against it and then clean up at the end. We should support temporary tables properly, meaning automatically manage the life cycle and make sure the visibility is restricted to the creating connection/session. Without these it's common to see left over tables in meta-store or weird errors with clashing tmp table names. Proposed syntax: CREATE TEMPORARY TABLE CTAS, CTL, INSERT INTO, should all be supported as usual. Knowing that a user wants a temp table can enable us to further optimize access to it. E.g.: temp tables should be kept in memory where possible, compactions and merging table files aren't required, ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6893) out of sequence error in HiveMetastore server
[ https://issues.apache.org/jira/browse/HIVE-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041510#comment-14041510 ] Gilad Wolff commented on HIVE-6893: --- I encountered the same issue, we get a socket read timeout and then out-of-sequence error. In one case we got an OOM in our client and I suspect it's the same underlying issue. Here is the metastore sequence of events. Our client tried to drop a table starting at 14:02:25. Note that we use a 20 seconds timeout for our client: {code} 2014-06-23 14:02:25,181 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 11: source:/10.20.93.47 drop_table : db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 2014-06-23 14:02:25,181 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue ip=/10.20.93.47 cmd=source:/10.20.93.47 drop_table : db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 2014-06-23 14:02:25,182 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 11: source:/10.20.93.47 get_table : db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 2014-06-23 14:02:25,182 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue ip=/10.20.93.47 cmd=source:/10.20.93.47 get_table : db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 2014-06-23 14:02:46,596 INFO hive.metastore.hivemetastoressimpl: deleting hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table 2014-06-23 14:02:46,694 INFO hive.metastore.hivemetastoressimpl: Moved to trash: hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table {code} On our client we get a socket timeout for the drop table call at 14:02:45: {code} 2:02:45.209 PM WARN com.cloudera.cmon.firehose.polling.hive.HiveMetastoreCanary Metastore HIVE-1-HIVEMETASTORE-627a77825bb851bf2db30317a698dded: Failed to drop table com.cloudera.cmf.cdhclient.common.hive.MetaException: java.net.SocketTimeoutException: Read timed out {code} we then try to drop the database immediately afterwards and the next message in our logs is: {code} 2:02:46.697 PM WARNcom.cloudera.cmf.cdh4client.hive.MetastoreClientImpl Could not drop hive database: cloudera_manager_metastore_canary_test_db com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TApplicationException: get_database failed: out of sequence response at com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:479) at com.cloudera.cmf.cdh4client.hive.MetastoreClientImpl.dropDatabase(MetastoreClientImpl.java:160) {code} Note that the moved-to-trash message in the hive metastore is from 14:02:46,694 and the out-of-order exception is from 2:02:46.697. I know that order-in-time does not imply causation but is it possible that we are getting the drop-table acknowledgment message instead of the get_database message? out of sequence error in HiveMetastore server - Key: HIVE-6893 URL: https://issues.apache.org/jira/browse/HIVE-6893 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Romain Rigaux Assignee: Naveen Gangam Fix For: 0.14.0 Attachments: HIVE-6893.1.patch Calls listing databases or tables fail. It seems to be a concurrency problem. {code} 014-03-06 05:34:00,785 ERROR hive.log: org.apache.thrift.TApplicationException: get_databases failed: out of sequence response at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:472) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:459) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:648) at org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:66) at org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:278) at
[jira] [Commented] (HIVE-7257) UDF format_number() does not work on FLOAT types
[ https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041525#comment-14041525 ] Szehon Ho commented on HIVE-7257: - [~wilbur.yang]. Thanks for this patch, I do have one concern. the query {noformat}SELECT format_number(CAST(12332.123456 AS FLOAT), 4),{noformat} shows a result like : {noformat}12,332.1230{noformat} It doesn't look correct, unless I'm missing something? I would expect 12332.1235, like it shows in decimal. UDF format_number() does not work on FLOAT types Key: HIVE-7257 URL: https://issues.apache.org/jira/browse/HIVE-7257 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang Attachments: HIVE-7257.1.patch #1 Show the table: hive describe ssga3; OK sourcestring test float dttimestamp Time taken: 0.243 seconds #2 Run format_number on double and it works: hive select format_number(cast(test as double),2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0009, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0009 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.47 sec MapReduce Total cumulative CPU time: 1 seconds 470 msec Ended Job = job_201403131616_0009 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 470 msec OK 1.00 2.00 Time taken: 16.563 seconds #3 Run format_number on float and it does not work hive select format_number(test,2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0010, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403131616_0010 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Examining task ID: task_201403131616_0010_m_02 (and more) from job job_201403131616_0010 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid host:port authority: logicaljt Task with the most failures(4): Task ID: task_201403131616_0010_m_00 Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141) .. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT
[ https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-2597: Attachment: HIVE-2597.4.patch.txt Updated XML results Repeated key in GROUP BY is erroneously displayed when using DISTINCT - Key: HIVE-2597 URL: https://issues.apache.org/jira/browse/HIVE-2597 Project: Hive Issue Type: Bug Reporter: Alex Rovner Assignee: Navis Attachments: HIVE-2597.3.patch.txt, HIVE-2597.4.patch.txt, HIVE-2597.D8967.1.patch, HIVE-2597.D8967.2.patch The following query was simplified for illustration purposes. This works correctly: select client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid The intent here is to produce two empty columns in between data. The following query does not work: select distinct client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY The key is not repeated since the aliases were given. Seems like Hive is ignoring the aliases when the distinct keyword is specified. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 22901: Repeated key in GROUP BY is erroneously displayed when using DISTINCT
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22901/ --- Review request for hive. Bugs: HIVE-2597 https://issues.apache.org/jira/browse/HIVE-2597 Repository: hive-git Description --- The following query was simplified for illustration purposes. This works correctly: select client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid The intent here is to produce two empty columns in between data. The following query does not work: select distinct client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY The key is not repeated since the aliases were given. Seems like Hive is ignoring the aliases when the distinct keyword is specified. Diffs - ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8ae1c73 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cb284d7 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java c60f56f ql/src/test/queries/clientpositive/groupby_duplicate_key.q PRE-CREATION ql/src/test/results/clientpositive/groupby_duplicate_key.q.out PRE-CREATION ql/src/test/results/compiler/plan/groupby1.q.xml af100ed ql/src/test/results/compiler/plan/groupby4.q.xml 1822733 ql/src/test/results/compiler/plan/groupby5.q.xml 0bfc684 ql/src/test/results/compiler/plan/groupby6.q.xml 5b3696c ql/src/test/results/compiler/plan/join1.q.xml e88d5dd ql/src/test/results/compiler/plan/join2.q.xml 11c44c7 ql/src/test/results/compiler/plan/join3.q.xml 6fde4e0 ql/src/test/results/compiler/plan/join4.q.xml 22a4911 ql/src/test/results/compiler/plan/join5.q.xml 5033366 ql/src/test/results/compiler/plan/join6.q.xml b1185a9 ql/src/test/results/compiler/plan/join7.q.xml a1ab3e6 ql/src/test/results/compiler/plan/join8.q.xml ba128d4 Diff: https://reviews.apache.org/r/22901/diff/ Testing --- Thanks, Navis Ryu
Review Request 22902: SerDe Properties are not considered by show create table Command
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22902/ --- Review request for hive. Bugs: HIVE-7270 https://issues.apache.org/jira/browse/HIVE-7270 Repository: hive-git Description --- The HIVE table DDl generated by show create table target_table command does not contain SerDe properties of the target table even though it contain specific SerDe properties. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java fad5ed3 ql/src/test/queries/clientpositive/show_create_table_serde.q a3eb5a8 ql/src/test/results/clientpositive/show_create_table_serde.q.out a9e92b4 Diff: https://reviews.apache.org/r/22902/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-7257) UDF format_number() does not work on FLOAT types
[ https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041533#comment-14041533 ] Wilbur Yang commented on HIVE-7257: --- [~szehon], thanks for the review. That particular case seems to be a quirk with floats -- I [tested System.out.println((float)12332.123456);|http://ideone.com/oP4NDJ] and it prints 12332.123. I suppose the question now is whether or not we want it to behave like this. UDF format_number() does not work on FLOAT types Key: HIVE-7257 URL: https://issues.apache.org/jira/browse/HIVE-7257 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang Attachments: HIVE-7257.1.patch #1 Show the table: hive describe ssga3; OK sourcestring test float dttimestamp Time taken: 0.243 seconds #2 Run format_number on double and it works: hive select format_number(cast(test as double),2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0009, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0009 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.47 sec MapReduce Total cumulative CPU time: 1 seconds 470 msec Ended Job = job_201403131616_0009 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 470 msec OK 1.00 2.00 Time taken: 16.563 seconds #3 Run format_number on float and it does not work hive select format_number(test,2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0010, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403131616_0010 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Examining task ID: task_201403131616_0010_m_02 (and more) from job job_201403131616_0010 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid host:port authority: logicaljt Task with the most failures(4): Task ID: task_201403131616_0010_m_00 Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141) .. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization
[ https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7205: Status: Patch Available (was: Open) Wrong results when union all of grouping followed by group by with correlation optimization --- Key: HIVE-7205 URL: https://issues.apache.org/jira/browse/HIVE-7205 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: dima machlin Assignee: Navis Priority: Critical Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt use case : table TBL (a string,b string) contains single row : 'a','a' the following query : {code:sql} select b, sum(cc) from ( select b,count(1) as cc from TBL group by b union all select a as b,count(1) as cc from TBL group by a ) z group by b {code} returns a 1 a 1 while set hive.optimize.correlation=true; if we change set hive.optimize.correlation=false; it returns correct results : a 2 The plan with correlation optimization : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: null-subquery1:z-subquery1:TBL TableScan alias: TBL Select Operator expressions: expr: b type: string outputColumnNames: b Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: b type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: bigint null-subquery2:z-subquery2:TBL TableScan alias: TBL Select Operator expressions: expr: a type: string outputColumnNames: a Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: a type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Demux Operator Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Union Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint
[jira] [Updated] (HIVE-7237) hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever
[ https://issues.apache.org/jira/browse/HIVE-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7237: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Ashutosh, for the review. hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever - Key: HIVE-7237 URL: https://issues.apache.org/jira/browse/HIVE-7237 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.13.0 Environment: HDP 2.1, Hive 0.13, SLES 11, 128GB data nodes, ORC SNAPPY Reporter: Douglas Moore Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7237.1.patch.txt, HIVE-7237.2.patch.txt set hive.exec.parallel=true; will cause the Yarn application instance to linger forever. set hive.exec.parallel=false, the application goes away as soon as hive query is complete. The underlying table is an ORC store_sales table compressed with SNAPPY. {code} hive.exec.parallel=true; select * from store_sales where ss_ticket_number=5741230 and ss_item_sk=4825 {code} The query will run under Tez and finish 30 seconds. After 30-40 of these jobs the cluster gets to a point where no jobs will finish. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7211) Throws exception if the name of conf var starts with hive. does not exists in HiveConf
[ https://issues.apache.org/jira/browse/HIVE-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7211: Attachment: HIVE-7211.4.patch.txt Throws exception if the name of conf var starts with hive. does not exists in HiveConf Key: HIVE-7211 URL: https://issues.apache.org/jira/browse/HIVE-7211 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7211.1.patch.txt, HIVE-7211.2.patch.txt, HIVE-7211.3.patch.txt, HIVE-7211.4.patch.txt Some typos in configurations are very hard to find. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 22903: Extend join transitivity PPD to non-column expressions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22903/ --- Review request for hive. Bugs: HIVE-7111 https://issues.apache.org/jira/browse/HIVE-7111 Repository: hive-git Description --- Join transitive in PPD only supports column expressions, but it's possible to extend this to generic expressions. Diffs - ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java f293c43 ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java f7a3f1c ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 7aaf455 ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicatePushDown.java e0d6aaf ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java 1476e1a ql/src/test/queries/clientpositive/auto_join33.q PRE-CREATION ql/src/test/results/clientpositive/auto_join33.q.out PRE-CREATION Diff: https://reviews.apache.org/r/22903/diff/ Testing --- Thanks, Navis Ryu
Review Request 22904: Throws exception if the name of conf var starts with hive. does not exists in HiveConf
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22904/ --- Review request for hive. Bugs: HIVE-7211 https://issues.apache.org/jira/browse/HIVE-7211 Repository: hive-git Description --- Some typos in configurations are very hard to find. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7932a3d hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 7b91e1d hbase-handler/src/test/queries/positive/hbase_stats.q 52efef5 hbase-handler/src/test/queries/positive/hbase_stats2.q 520e003 hbase-handler/src/test/queries/positive/hbase_stats3.q c3134f0 hbase-handler/src/test/results/positive/hbase_stats2.q.out 80e1c6d hbase-handler/src/test/results/positive/hbase_stats3.q.out ce7dda4 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/SpecialCases.java 0c1fa23 hcatalog/core/src/main/java/org/apache/hive/hcatalog/rcfile/RCFileMapReduceOutputFormat.java b09ab4c hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java 9a89980 itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/VerifyOverriddenConfigsHook.java 41c178a itests/util/src/main/java/org/apache/hadoop/hive/ql/stats/DummyStatsAggregator.java 1bafd97 itests/util/src/main/java/org/apache/hadoop/hive/ql/stats/DummyStatsPublisher.java 4dd632d ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5e5cf97 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 179ad29 ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 3bc7e43 ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java 953d9b4 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java ffd7597 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateMapper.java 257f186 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a988b44 ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 9b24bfd ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 464bd5e ql/src/test/queries/clientpositive/dbtxnmgr_compact1.q 6612fe8 ql/src/test/queries/clientpositive/dbtxnmgr_compact2.q 599cad9 ql/src/test/queries/clientpositive/dbtxnmgr_compact3.q 871d292 ql/src/test/queries/clientpositive/dbtxnmgr_showlocks.q 7c71fdd ql/src/test/queries/clientpositive/index_bitmap_compression.q 4e93275 ql/src/test/queries/clientpositive/index_compression.q 1bb29a5 ql/src/test/queries/clientpositive/join25.q 75f542d ql/src/test/queries/clientpositive/join36.q dd99d44 ql/src/test/queries/clientpositive/join37.q dc57d3a ql/src/test/queries/clientpositive/join_nulls.q 6c8ad10 ql/src/test/queries/clientpositive/join_nullsafe.q 7c3d1e8 ql/src/test/queries/clientpositive/metadata_export_drop.q e2da61a ql/src/test/queries/clientpositive/overridden_confs.q 9dcaed6 ql/src/test/queries/clientpositive/quotedid_skew.q 5c95967 ql/src/test/queries/clientpositive/skewjoin_union_remove_1.q fc07742 ql/src/test/queries/clientpositive/skewjoin_union_remove_2.q 50cfc61 ql/src/test/queries/clientpositive/skewjoinopt1.q 504ba8b ql/src/test/queries/clientpositive/skewjoinopt10.q f35af90 ql/src/test/queries/clientpositive/skewjoinopt11.q 9e00bdc ql/src/test/queries/clientpositive/skewjoinopt12.q 1719950 ql/src/test/queries/clientpositive/skewjoinopt13.q 5ef217c ql/src/test/queries/clientpositive/skewjoinopt14.q df1a26b ql/src/test/queries/clientpositive/skewjoinopt15.q 1db5472 ql/src/test/queries/clientpositive/skewjoinopt16.q 915de61 ql/src/test/queries/clientpositive/skewjoinopt17.q 2ee79cc ql/src/test/queries/clientpositive/skewjoinopt18.q 9d06cc0 ql/src/test/queries/clientpositive/skewjoinopt19.q 075645f ql/src/test/queries/clientpositive/skewjoinopt2.q f7acaad ql/src/test/queries/clientpositive/skewjoinopt20.q 9b908ce ql/src/test/queries/clientpositive/skewjoinopt3.q 22ea4f0 ql/src/test/queries/clientpositive/skewjoinopt4.q 8496b1a ql/src/test/queries/clientpositive/skewjoinopt5.q 152de5b ql/src/test/queries/clientpositive/skewjoinopt6.q 2e261bd ql/src/test/queries/clientpositive/skewjoinopt7.q e4d9605 ql/src/test/queries/clientpositive/skewjoinopt8.q 85746d9 ql/src/test/queries/clientpositive/skewjoinopt9.q 889ab6c ql/src/test/queries/clientpositive/smb_mapjoin_25.q e43174b ql/src/test/queries/clientpositive/stats15.q 9a557c6 ql/src/test/queries/clientpositive/truncate_table.q 975c0f1 ql/src/test/queries/clientpositive/udtf_explode.q 1d405b3 ql/src/test/queries/clientpositive/vector_decimal_mapjoin.q d8b3d1a ql/src/test/queries/clientpositive/vectorized_bucketmapjoin1.q e309713 ql/src/test/queries/clientpositive/vectorized_mapjoin.q f390c2c ql/src/test/queries/clientpositive/vectorized_nested_mapjoin.q ce4227c
[jira] [Commented] (HIVE-7232) VectorReduceSink is emitting incorrect JOIN keys
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041575#comment-14041575 ] Gopal V commented on HIVE-7232: --- [~ashutoshc]: Yes, I will review this today. VectorReduceSink is emitting incorrect JOIN keys Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-7232-extra-logging.patch, HIVE-7232.1.patch.txt, q5.explain.txt, q5.sql After HIVE-7121, tpc-h query5 has resulted in incorrect results. Thanks to [~navis], it has been tracked down to the auto-parallel settings which were initialized for ReduceSinkOperator, but not for VectorReduceSinkOperator. The vector version inherits, but doesn't call super.initializeOp() or set up the variable correctly from ReduceSinkDesc. The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7257) UDF format_number() does not work on FLOAT types
[ https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041576#comment-14041576 ] Szehon Ho commented on HIVE-7257: - Ah, I guess that is most bits that fit into that float. I'm ok with the change then, +1. UDF format_number() does not work on FLOAT types Key: HIVE-7257 URL: https://issues.apache.org/jira/browse/HIVE-7257 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang Attachments: HIVE-7257.1.patch #1 Show the table: hive describe ssga3; OK sourcestring test float dttimestamp Time taken: 0.243 seconds #2 Run format_number on double and it works: hive select format_number(cast(test as double),2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0009, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0009 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.47 sec MapReduce Total cumulative CPU time: 1 seconds 470 msec Ended Job = job_201403131616_0009 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 470 msec OK 1.00 2.00 Time taken: 16.563 seconds #3 Run format_number on float and it does not work hive select format_number(test,2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0010, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403131616_0010 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Examining task ID: task_201403131616_0010_m_02 (and more) from job job_201403131616_0010 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid host:port authority: logicaljt Task with the most failures(4): Task ID: task_201403131616_0010_m_00 Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141) .. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7194) authorization_ctas.q failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7194: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Thejas Ashutosh. authorization_ctas.q failing on trunk - Key: HIVE-7194 URL: https://issues.apache.org/jira/browse/HIVE-7194 Project: Hive Issue Type: Task Components: Authorization Reporter: Ashutosh Chauhan Assignee: Thejas M Nair Fix For: 0.14.0 Attachments: HIVE-7194.1.patch.txt, HIVE-7194.patch Need to update .q.out file -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7271) Speed up unit tests
[ https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7271: - Status: Open (was: Patch Available) Speed up unit tests --- Key: HIVE-7271 URL: https://issues.apache.org/jira/browse/HIVE-7271 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch Did some experiments to see if there's a way to speed up unit tests. TestCliDriver seemed to take a lot of time just spinning up/tearing down JVMs. I was also curious to see if running everything on a ram disk would help. Results (I ran tests up to authorization_2): - Current setup: 40 minutes - Single JVM (not using child JVM to run all queries): 8 minutes - Single JVM + ram disk: 7 minutes So the ram disk didn't help that much. But running tests in single JVM seems worthwhile doing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7271) Speed up unit tests
[ https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7271: - Attachment: HIVE-7271.6.patch .6 fixes test failures (golden files again). Also includes the renamed methods [~szehon] asked for. Speed up unit tests --- Key: HIVE-7271 URL: https://issues.apache.org/jira/browse/HIVE-7271 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch Did some experiments to see if there's a way to speed up unit tests. TestCliDriver seemed to take a lot of time just spinning up/tearing down JVMs. I was also curious to see if running everything on a ram disk would help. Results (I ran tests up to authorization_2): - Current setup: 40 minutes - Single JVM (not using child JVM to run all queries): 8 minutes - Single JVM + ram disk: 7 minutes So the ram disk didn't help that much. But running tests in single JVM seems worthwhile doing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7271) Speed up unit tests
[ https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7271: - Status: Patch Available (was: Open) Speed up unit tests --- Key: HIVE-7271 URL: https://issues.apache.org/jira/browse/HIVE-7271 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch Did some experiments to see if there's a way to speed up unit tests. TestCliDriver seemed to take a lot of time just spinning up/tearing down JVMs. I was also curious to see if running everything on a ram disk would help. Results (I ran tests up to authorization_2): - Current setup: 40 minutes - Single JVM (not using child JVM to run all queries): 8 minutes - Single JVM + ram disk: 7 minutes So the ram disk didn't help that much. But running tests in single JVM seems worthwhile doing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7271) Speed up unit tests
[ https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041634#comment-14041634 ] Brock Noland commented on HIVE-7271: +1 LGTM Regardless of the memory item, I updated our instance types since the c3 have a faster CPU. Speed up unit tests --- Key: HIVE-7271 URL: https://issues.apache.org/jira/browse/HIVE-7271 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch Did some experiments to see if there's a way to speed up unit tests. TestCliDriver seemed to take a lot of time just spinning up/tearing down JVMs. I was also curious to see if running everything on a ram disk would help. Results (I ran tests up to authorization_2): - Current setup: 40 minutes - Single JVM (not using child JVM to run all queries): 8 minutes - Single JVM + ram disk: 7 minutes So the ram disk didn't help that much. But running tests in single JVM seems worthwhile doing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7274) Update PTest2 to JClouds 1.7.3
[ https://issues.apache.org/jira/browse/HIVE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7274: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you for the review Szehon! I have committed this to trunk. Update PTest2 to JClouds 1.7.3 -- Key: HIVE-7274 URL: https://issues.apache.org/jira/browse/HIVE-7274 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-7274.patch Required to use newer instance types -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041657#comment-14041657 ] Bing Li commented on HIVE-4577: --- Hi, [~thejas] Thank you for your comments. I tried StrTokenizer, seems it only can handle part of scenarios, like dfs -mkdir hello world // StrTokenizer(cmd,splitDel,doubleQuo) dfs -mkdir 'hello world // StrTokenizer(cmd,splitDel,singleQuo) But can't handle the wrong input. like dfs -mkdir abd'dbabe'// and ' are not matched Let me know if I missed something. Thank you! hive CLI can't handle hadoop dfs command with space and quotes. Key: HIVE-4577 URL: https://issues.apache.org/jira/browse/HIVE-4577 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0, 0.10.0 Reporter: Bing Li Assignee: Bing Li Fix For: 0.14.0 Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, HIVE-4577.3.patch.txt As design, hive could support hadoop dfs command in hive shell, like hive dfs -mkdir /user/biadmin/mydir; but has different behavior with hadoop if the path contains space and quotes hive dfs -mkdir hello; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 /user/biadmin/hello hive dfs -mkdir 'world'; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 /user/biadmin/'world' hive dfs -mkdir bei jing; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/bei drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/jing -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT
[ https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041669#comment-14041669 ] Hive QA commented on HIVE-2597: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12652100/HIVE-2597.4.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5655 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/570/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/570/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-570/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12652100 Repeated key in GROUP BY is erroneously displayed when using DISTINCT - Key: HIVE-2597 URL: https://issues.apache.org/jira/browse/HIVE-2597 Project: Hive Issue Type: Bug Reporter: Alex Rovner Assignee: Navis Attachments: HIVE-2597.3.patch.txt, HIVE-2597.4.patch.txt, HIVE-2597.D8967.1.patch, HIVE-2597.D8967.2.patch The following query was simplified for illustration purposes. This works correctly: select client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid The intent here is to produce two empty columns in between data. The following query does not work: select distinct client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY The key is not repeated since the aliases were given. Seems like Hive is ignoring the aliases when the distinct keyword is specified. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6564) WebHCat E2E tests that launch MR jobs fail on check job completion timeout
[ https://issues.apache.org/jira/browse/HIVE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041678#comment-14041678 ] Ashutosh Chauhan commented on HIVE-6564: +1 WebHCat E2E tests that launch MR jobs fail on check job completion timeout -- Key: HIVE-6564 URL: https://issues.apache.org/jira/browse/HIVE-6564 Project: Hive Issue Type: Bug Components: Tests, WebHCat Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-6564.2.patch, HIVE-6564.patch WebHCat E2E tests that fire off an MR job are not correctly being detected as complete so those tests are timing out. The problem is happening because of JSON module available through cpan which returns 1 or 0 instead of true or false. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7241) Wrong lock acquired for alter table rename partition
[ https://issues.apache.org/jira/browse/HIVE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041679#comment-14041679 ] Ashutosh Chauhan commented on HIVE-7241: +1 Wrong lock acquired for alter table rename partition Key: HIVE-7241 URL: https://issues.apache.org/jira/browse/HIVE-7241 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7241.patch, HIVE-7241.patch Doing an alter table foo partition (bar='x') rename to partition (bar='y') acquires a read lock on table foo. It should instead acquire an exclusive lock on partition bar=x. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7242) alter table drop partition is acquiring the wrong type of lock
[ https://issues.apache.org/jira/browse/HIVE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041685#comment-14041685 ] Ashutosh Chauhan commented on HIVE-7242: +1 alter table drop partition is acquiring the wrong type of lock -- Key: HIVE-7242 URL: https://issues.apache.org/jira/browse/HIVE-7242 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.14.0 Attachments: HIVE-7242.patch Doing an alter table foo drop partition ('bar=x') acquired a shared-write lock on partition bar=x. It should be acquiring an exclusive lock in that case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7258) Move qtest-Driver properties from pom to separate file
[ https://issues.apache.org/jira/browse/HIVE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041691#comment-14041691 ] Gunther Hagleitner commented on HIVE-7258: -- [~szehon] - is that what you were looking for? Can you pull the values from that file? Move qtest-Driver properties from pom to separate file Key: HIVE-7258 URL: https://issues.apache.org/jira/browse/HIVE-7258 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Gunther Hagleitner Attachments: HIVE-7258.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7258) Move qtest-Driver properties from pom to separate file
[ https://issues.apache.org/jira/browse/HIVE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner reassigned HIVE-7258: Assignee: Gunther Hagleitner Move qtest-Driver properties from pom to separate file Key: HIVE-7258 URL: https://issues.apache.org/jira/browse/HIVE-7258 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Gunther Hagleitner Attachments: HIVE-7258.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7258) Move qtest-Driver properties from pom to separate file
[ https://issues.apache.org/jira/browse/HIVE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7258: - Status: Patch Available (was: Open) Move qtest-Driver properties from pom to separate file Key: HIVE-7258 URL: https://issues.apache.org/jira/browse/HIVE-7258 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Gunther Hagleitner Attachments: HIVE-7258.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)