[jira] [Updated] (HIVE-7185) KeyWrapperFactory#TextKeyWrapper#equals() extracts Text incorrectly when isCopy is false
[ https://issues.apache.org/jira/browse/HIVE-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SUYEON LEE updated HIVE-7185: - Status: Open (was: Patch Available) KeyWrapperFactory#TextKeyWrapper#equals() extracts Text incorrectly when isCopy is false Key: HIVE-7185 URL: https://issues.apache.org/jira/browse/HIVE-7185 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Attachments: HIVE-7185.patch {code} } else { t1 = soi_new.getPrimitiveWritableObject(key); t2 = soi_copy.getPrimitiveWritableObject(obj); {code} t2 should be assigned soi_new.getPrimitiveWritableObject(obj) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7236) Tez progress monitor should indicate running/failed tasks
[ https://issues.apache.org/jira/browse/HIVE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037071#comment-14037071 ] Lefty Leverenz commented on HIVE-7236: -- Where can this be documented? Tez progress monitor should indicate running/failed tasks - Key: HIVE-7236 URL: https://issues.apache.org/jira/browse/HIVE-7236 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7236.1.patch Currently, the only logging in TezJobMonitor is for completed tasks. This makes it hard to locate task stalls and task failures. Failure scenarios are harder to debug, in particular when analyzing query runs on a cluster with bad nodes. Change the job monitor to log running failed tasks as follows. {code} Map 1: 0(+157,-1)/1755 Reducer 2: 0/1 Map 1: 0(+168,-1)/1755 Reducer 2: 0/1 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 {code} That is 189 tasks running, 1 failure and 0 complete. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6694) Beeline should provide a way to execute shell command as Hive CLI does
[ https://issues.apache.org/jira/browse/HIVE-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037074#comment-14037074 ] Hive QA commented on HIVE-6694: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650965/HIVE-6694.4.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5655 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/511/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/511/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-511/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650965 Beeline should provide a way to execute shell command as Hive CLI does -- Key: HIVE-6694 URL: https://issues.apache.org/jira/browse/HIVE-6694 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.11.0, 0.12.0, 0.13.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.14.0 Attachments: HIVE-6694.1.patch, HIVE-6694.1.patch, HIVE-6694.2.patch, HIVE-6694.3.patch, HIVE-6694.4.patch, HIVE-6694.patch Hive CLI allows a user to execute a shell command using ! notation. For instance, !cat myfile.txt. Being able to execute shell command may be important for some users. As a replacement, however, Beeline provides no such capability, possibly because ! notation is reserved for SQLLine commands. It's possible to provide this using a slightly syntactic variation such as !sh cat myfilie.txt. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization
[ https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7205: Attachment: HIVE-7205.1.patch.txt Wrong results when union all of grouping followed by group by with correlation optimization --- Key: HIVE-7205 URL: https://issues.apache.org/jira/browse/HIVE-7205 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: dima machlin Priority: Critical Attachments: HIVE-7205.1.patch.txt use case : table TBL (a string,b string) contains single row : 'a','a' the following query : {code:sql} select b, sum(cc) from ( select b,count(1) as cc from TBL group by b union all select a as b,count(1) as cc from TBL group by a ) z group by b {code} returns a 1 a 1 while set hive.optimize.correlation=true; if we change set hive.optimize.correlation=false; it returns correct results : a 2 The plan with correlation optimization : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: null-subquery1:z-subquery1:TBL TableScan alias: TBL Select Operator expressions: expr: b type: string outputColumnNames: b Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: b type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: bigint null-subquery2:z-subquery2:TBL TableScan alias: TBL Select Operator expressions: expr: a type: string outputColumnNames: a Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: a type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Demux Operator Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Union Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Mux
[jira] [Commented] (HIVE-7232) VectorReduceSink is emitting incorrect JOIN keys
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037176#comment-14037176 ] Hive QA commented on HIVE-7232: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650983/HIVE-7232.1.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5654 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/513/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/513/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-513/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650983 VectorReduceSink is emitting incorrect JOIN keys Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-7232-extra-logging.patch, HIVE-7232.1.patch.txt, q5.explain.txt, q5.sql After HIVE-7121, tpc-h query5 has resulted in incorrect results. Thanks to [~navis], it has been tracked down to the auto-parallel settings which were initialized for ReduceSinkOperator, but not for VectorReduceSinkOperator. The vector version inherits, but doesn't call super.initializeOp() or set up the variable correctly from ReduceSinkDesc. The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization
[ https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7205: Assignee: Navis Affects Version/s: 0.13.0 0.13.1 Status: Patch Available (was: Open) Running preliminary test Wrong results when union all of grouping followed by group by with correlation optimization --- Key: HIVE-7205 URL: https://issues.apache.org/jira/browse/HIVE-7205 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: dima machlin Assignee: Navis Priority: Critical Attachments: HIVE-7205.1.patch.txt use case : table TBL (a string,b string) contains single row : 'a','a' the following query : {code:sql} select b, sum(cc) from ( select b,count(1) as cc from TBL group by b union all select a as b,count(1) as cc from TBL group by a ) z group by b {code} returns a 1 a 1 while set hive.optimize.correlation=true; if we change set hive.optimize.correlation=false; it returns correct results : a 2 The plan with correlation optimization : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: null-subquery1:z-subquery1:TBL TableScan alias: TBL Select Operator expressions: expr: b type: string outputColumnNames: b Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: b type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: bigint null-subquery2:z-subquery2:TBL TableScan alias: TBL Select Operator expressions: expr: a type: string outputColumnNames: a Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: a type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Demux Operator Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Union Select Operator expressions: expr: _col0
[jira] [Commented] (HIVE-4605) Hive job fails while closing reducer output - Unable to rename
[ https://issues.apache.org/jira/browse/HIVE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037200#comment-14037200 ] George Wong commented on HIVE-4605: --- We ran into this issue in our cluster. The error message is like this {code} org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://***:8020/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_task_tmp.-ext-10002/_tmp.00_3 to: hdfs://***:8020/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_tmp.-ext-10002/00_3 at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:470) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:407) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) {code} The log of NameNode shows {code} 2014-06-16 20:43:38,582 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_task_tmp.-ext-10002/_tmp.00_3 to /tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_tmp.-ext-10002/00_3 because destination's parent does not exist {code} Hive job fails while closing reducer output - Unable to rename -- Key: HIVE-4605 URL: https://issues.apache.org/jira/browse/HIVE-4605 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: OS: 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Hadoop 1.1.2 Reporter: Link Qian Assignee: Brock Noland 1, create a table with ORC storage model create table iparea_analysis_orc (network int, ip string, ) stored as ORC; 2, insert table iparea_analysis_orc select network, ip, , the script success, but failed after add *OVERWRITE* keyword. the main error log list as here. ava.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from: hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0 to: hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0 at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:317) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0 to: hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0 at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at
[jira] [Commented] (HIVE-4605) Hive job fails while closing reducer output - Unable to rename
[ https://issues.apache.org/jira/browse/HIVE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037203#comment-14037203 ] George Wong commented on HIVE-4605: --- I went through the code of FileSink operator The code is like this. {code} if ((bDynParts || isSkewedStoredAsSubDirectories) !fs.exists(finalPaths[idx].getParent())) { fs.mkdirs(finalPaths[idx].getParent()); } {code} I am wondering why we should check bDynParts and isSkewedStoredAsSubDirectories. In the code, the output is move to finalPath no matter what the values of bDynParts and isSkewedStoredAsSubDirectories are. Since the date move is not avoidable, why not change the code to the following to make sure the path exists before moving the file. {code} if (!fs.exists(finalPaths[idx].getParent())) { fs.mkdirs(finalPaths[idx].getParent()); } {code} Hive job fails while closing reducer output - Unable to rename -- Key: HIVE-4605 URL: https://issues.apache.org/jira/browse/HIVE-4605 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: OS: 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Hadoop 1.1.2 Reporter: Link Qian Assignee: Brock Noland 1, create a table with ORC storage model create table iparea_analysis_orc (network int, ip string, ) stored as ORC; 2, insert table iparea_analysis_orc select network, ip, , the script success, but failed after add *OVERWRITE* keyword. the main error log list as here. ava.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from: hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0 to: hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0 at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:317) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0 to: hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0 at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309) ... 7 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6694) Beeline should provide a way to execute shell command as Hive CLI does
[ https://issues.apache.org/jira/browse/HIVE-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037328#comment-14037328 ] Xuefu Zhang commented on HIVE-6694: --- [~brocknoland] Would you mind taking another look at the patch? Thanks. Beeline should provide a way to execute shell command as Hive CLI does -- Key: HIVE-6694 URL: https://issues.apache.org/jira/browse/HIVE-6694 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.11.0, 0.12.0, 0.13.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.14.0 Attachments: HIVE-6694.1.patch, HIVE-6694.1.patch, HIVE-6694.2.patch, HIVE-6694.3.patch, HIVE-6694.4.patch, HIVE-6694.patch Hive CLI allows a user to execute a shell command using ! notation. For instance, !cat myfile.txt. Being able to execute shell command may be important for some users. As a replacement, however, Beeline provides no such capability, possibly because ! notation is reserved for SQLLine commands. It's possible to provide this using a slightly syntactic variation such as !sh cat myfilie.txt. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037339#comment-14037339 ] Swarnim Kulkarni commented on HIVE-7230: I guess redundancy would have been a better word here. :) My only concern was that pulling the formatter settings from the remote guide and also checking in the formatter file seems a bit redundant to me. Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7235) TABLESAMPLE on join table is regarded as alias
[ https://issues.apache.org/jira/browse/HIVE-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037348#comment-14037348 ] Hive QA commented on HIVE-7235: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651013/HIVE-7235.1.patch.txt {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5654 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/515/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/515/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-515/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651013 TABLESAMPLE on join table is regarded as alias -- Key: HIVE-7235 URL: https://issues.apache.org/jira/browse/HIVE-7235 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7235.1.patch.txt {noformat} SELECT c_custkey, o_custkey FROM customer tablesample (1000 ROWS) join orders tablesample (1000 ROWS) on c_custkey = o_custkey; {noformat} Fails with NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
MiniTezCliDriver pre-commit tests are running
(changing subject) The MiniTezCliDriver tests have timed-out lately in the pre-commit tests, reducing coverage of the test as Ashutosh reported. I now configured the parallel-test framework to run MiniTezCliDriver in batches of 15 qtest, like the others. Now the timeout issue is fixed, and test reports are showing up for those. A nice thing is it speeds up the average speed of pre-commit tests by a lot, as it was bottlenecked on running all the 79 MiniTezCliDriver tests on one node. The only impact is, now if you are adding new MiniTezCliDriver tests, they need to be manually added in the Ptest config on the build machine , like explained in: https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2. I've added all 79 current tests manually. It might be a bigger impact for this driver than others, as Hive-Tez is under heavy development. I filed HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to explore improving it, but for now please follow that or notify me, to add the new test to the pre-commit test coverage. Thanks Szehon On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com wrote: + dev Good call, yep that will need to be configured. Brock On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com wrote: I was studying this a bit more, I believe the MiniTezCliDriver tests are hitting timeout after 2 hours as error code is 124. The framework is running all of them in one call, I'll try to chunk the tests into batches like the other q-tests. I'll try to take a look next week at this. Thanks Szehon On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com wrote: It looks like JVM OOM crash during MiniTezCliDriver tests, or its otherwise crashing. The 407 log has failures, but the 408 log is cut off. http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M. Do you guys know of any such issues? Thanks, Szehon On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com wrote: Looks like it's failing to generate a to generate a test output: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/ http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt exiting with 124 here: + wait 21961 + timeout 2h mvn -B -o test -Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven -Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver + ret=124 On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan hashut...@apache.org wrote: Build #407 ran MiniTezCliDriver http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/407/testReport/org.apache.hadoop.hive.cli/ but Build #408 didn't http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/408/testReport/org.apache.hadoop.hive.cli/ On Sat, Jun 7, 2014 at 12:25 PM, Szehon Ho sze...@cloudera.com wrote: Sounds like there's randomness, either in PTest test-parser or in the maven test itself. In the history now, its running between 5633-5707, which is similar to your range. http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport/history/ I didnt see any in history without MiniTezCLIDriver, can you point me to a build no. if you see one? If nobody else knows immediately, I can dig deeper at it next week to try to find out. On Sat, Jun 7, 2014 at 9:00 AM, Ashutosh Chauhan hashut...@apache.org wrote: I noticed that PTest2 framework runs different number of tests on various runs. e.g., on yesterday's runs I saw it ran 5585 5510 tests on subsequent runs. In particular, it seems its running MiniTezCliDriver tests in only half the runs. Anyone observed this?
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7159: Status: Patch Available (was: Open) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, HIVE-7159.9.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7159: Attachment: HIVE-7159.9.patch rebase For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, HIVE-7159.9.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7159: Status: Open (was: Patch Available) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, HIVE-7159.9.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037453#comment-14037453 ] Harish Butani commented on HIVE-7159: - The prunedCols contain columns from the inputRR. The parent pruner will have setup this. For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, HIVE-7159.9.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7237) hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever
[ https://issues.apache.org/jira/browse/HIVE-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037466#comment-14037466 ] Hive QA commented on HIVE-7237: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651056/HIVE-7237.1.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5654 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/516/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/516/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-516/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651056 hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever - Key: HIVE-7237 URL: https://issues.apache.org/jira/browse/HIVE-7237 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.13.0 Environment: HDP 2.1, Hive 0.13, SLES 11, 128GB data nodes, ORC SNAPPY Reporter: Douglas Moore Assignee: Navis Attachments: HIVE-7237.1.patch.txt set hive.exec.parallel=true; will cause the Yarn application instance to linger forever. set hive.exec.parallel=false, the application goes away as soon as hive query is complete. The underlying table is an ORC store_sales table compressed with SNAPPY. {code} hive.exec.parallel=true; select * from store_sales where ss_ticket_number=5741230 and ss_item_sk=4825 {code} The query will run under Tez and finish 30 seconds. After 30-40 of these jobs the cluster gets to a point where no jobs will finish. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input
[ https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037551#comment-14037551 ] Wilbur Yang commented on HIVE-6637: --- Regarding the currently submitted patch: I agree with the type checking for both arguments, and I think that this code will work in terms of functionality. However, I believe that there's a slight problem -- if a user runs a query with, say, an INT as the second argument, then the error message will be The 2nd argument of function IN_FILE must be a string, char or varchar... when it really must be a constant string. UDF in_file() doesn't take CHAR or VARCHAR as input --- Key: HIVE-6637 URL: https://issues.apache.org/jira/browse/HIVE-6637 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ashish Kumar Singh Attachments: HIVE-6637.patch {code} hive desc alter_varchar_1; key string None value varchar(3) None key2 int None value2varchar(10) None hive select in_file(value, value2) from alter_varchar_1; FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 'value': The 1st argument of function IN_FILE must be a string but org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a was given. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom
[ https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7206: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Duplicate declaration of build-helper-maven-plugin in root pom -- Key: HIVE-7206 URL: https://issues.apache.org/jira/browse/HIVE-7206 Project: Hive Issue Type: Task Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-7206.1.patch, HIVE-7206.patch Results in following warnings while building: [WARNING] Some problems were encountered while building the effective model for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must be unique but found duplicate declaration of plugin org.codehaus.mojo:build-helper-maven-plugin @ org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17 [WARNING] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7257) UDF format_number() does not work on FLOAT types
Wilbur Yang created HIVE-7257: - Summary: UDF format_number() does not work on FLOAT types Key: HIVE-7257 URL: https://issues.apache.org/jira/browse/HIVE-7257 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang #1 Show the table: hive describe ssga3; OK source string testfloat dt timestamp Time taken: 0.243 seconds #2 Run format_number on double and it works: hive select format_number(cast(test as double),2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0009, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0009 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.47 sec MapReduce Total cumulative CPU time: 1 seconds 470 msec Ended Job = job_201403131616_0009 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 470 msec OK 1.00 2.00 Time taken: 16.563 seconds #3 Run format_number on float and it does not work hive select format_number(test,2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0010, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403131616_0010 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Examining task ID: task_201403131616_0010_m_02 (and more) from job job_201403131616_0010 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid host:port authority: logicaljt Task with the most failures(4): Task ID: task_201403131616_0010_m_00 Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141) .. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037589#comment-14037589 ] Ashutosh Chauhan commented on HIVE-7159: Correct, prunedCols can only be subset of RowSchema, so if size matches no need for Select Op +1 For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, HIVE-7159.9.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037631#comment-14037631 ] David Chen commented on HIVE-7230: -- [~swarnim], in my patch, I am pointing the Maven Eclipse plugin to the formatter file that sits in the root of the source tree and not the remote guide. Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC
[ https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7250: - Attachment: HIVE-7250.5.patch Added missing apache license header in the newly added unit tests. Also made the unit tests not dependant on -Xmx Adaptive compression buffer size for wide tables in ORC --- Key: HIVE-7250 URL: https://issues.apache.org/jira/browse/HIVE-7250 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, HIVE-7250.4.patch, HIVE-7250.5.patch If the input table is wide (in the order of 1000s), ORC compression buffer size overhead becomes significant causing OOM issues. To overcome this issue, buffer size should be adaptively chosen based on the available memory and the number of columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC
[ https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7250: - Attachment: HIVE-7250.5.patch Adaptive compression buffer size for wide tables in ORC --- Key: HIVE-7250 URL: https://issues.apache.org/jira/browse/HIVE-7250 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, HIVE-7250.4.patch, HIVE-7250.5.patch If the input table is wide (in the order of 1000s), ORC compression buffer size overhead becomes significant causing OOM issues. To overcome this issue, buffer size should be adaptively chosen based on the available memory and the number of columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC
[ https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7250: - Attachment: (was: HIVE-7250.5.patch) Adaptive compression buffer size for wide tables in ORC --- Key: HIVE-7250 URL: https://issues.apache.org/jira/browse/HIVE-7250 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, HIVE-7250.4.patch, HIVE-7250.5.patch If the input table is wide (in the order of 1000s), ORC compression buffer size overhead becomes significant causing OOM issues. To overcome this issue, buffer size should be adaptively chosen based on the available memory and the number of columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22772: HIVE-6637: UDF in_file() doesn't take CHAR or VARCHAR as input
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22772/ --- (Updated June 19, 2014, 6:55 p.m.) Review request for hive. Changes --- Made changes based on comment on JIRA. Bugs: HIVE-6637 https://issues.apache.org/jira/browse/HIVE-6637 Repository: hive-git Description (updated) --- HIVE-6637: UDF in_file() doesn't take CHAR or VARCHAR as input Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInFile.java ea52537d0b85191f0b633a29aa3f7ddb556c288d ql/src/test/queries/clientpositive/udf_in_file.q 9d9efe8e23d6e73429ee5cd2c8470359ba2b3498 ql/src/test/results/clientpositive/udf_in_file.q.out b63143760d80f3f6a8ba0a23c0d87e8bb86fce66 Diff: https://reviews.apache.org/r/22772/diff/ Testing --- Tested with qtest. Thanks, Ashish Singh
[jira] [Updated] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input
[ https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh updated HIVE-6637: - Attachment: (was: HIVE-6637.patch) UDF in_file() doesn't take CHAR or VARCHAR as input --- Key: HIVE-6637 URL: https://issues.apache.org/jira/browse/HIVE-6637 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ashish Kumar Singh Attachments: HIVE-6637.1.patch {code} hive desc alter_varchar_1; key string None value varchar(3) None key2 int None value2varchar(10) None hive select in_file(value, value2) from alter_varchar_1; FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 'value': The 1st argument of function IN_FILE must be a string but org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a was given. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input
[ https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh updated HIVE-6637: - Attachment: HIVE-6637.1.patch UDF in_file() doesn't take CHAR or VARCHAR as input --- Key: HIVE-6637 URL: https://issues.apache.org/jira/browse/HIVE-6637 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ashish Kumar Singh Attachments: HIVE-6637.1.patch {code} hive desc alter_varchar_1; key string None value varchar(3) None key2 int None value2varchar(10) None hive select in_file(value, value2) from alter_varchar_1; FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 'value': The 1st argument of function IN_FILE must be a string but org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a was given. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input
[ https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037681#comment-14037681 ] Ashish Kumar Singh commented on HIVE-6637: -- [~wilbur.yang] good point. Updated the patch and RB. UDF in_file() doesn't take CHAR or VARCHAR as input --- Key: HIVE-6637 URL: https://issues.apache.org/jira/browse/HIVE-6637 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ashish Kumar Singh Attachments: HIVE-6637.1.patch {code} hive desc alter_varchar_1; key string None value varchar(3) None key2 int None value2varchar(10) None hive select in_file(value, value2) from alter_varchar_1; FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 'value': The 1st argument of function IN_FILE must be a string but org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a was given. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7251) Fix StorageDescriptor usage in unit tests
[ https://issues.apache.org/jira/browse/HIVE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037754#comment-14037754 ] Hive QA commented on HIVE-7251: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651195/HIVE-7251.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5639 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/517/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/517/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-517/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651195 Fix StorageDescriptor usage in unit tests -- Key: HIVE-7251 URL: https://issues.apache.org/jira/browse/HIVE-7251 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.1 Reporter: Pankit Thapar Priority: Minor Attachments: HIVE-7251.patch Current Approach : StorageDescriptor class is used to describe parameters like InputFormat, Outputformat, SerDeInfo, etc. for a hive table. Some of the class variables like InputFormat, OutputFormat, SerDeInfo.serializationLib, etc. are required fields when creating a storage descriptor object. For example : createTable command in the metaStoreClient creates the table with the default values of such variables defined in HiveConf or hive-default.xml But in unit tests, table is created in a slightly different way, that these values need to be set explicitly. Thus, when creating tables in tests, required fieldes of StorageDescriptor object need to be set. Issue with current approach : From some of the current usages of this class in unit tests, I noticed that when any one of the test cases tried to clean up the database and found a table created by any of the previously executed test cases, the clean up process tries to fetch the Table object and performs the sanity checks which include checking for required fields like InputFormat, OutputFormat, SerDeInfo.serializationLib of the table. The sanity checks fail which results in failure of the test case. Fix : In unit-tests, StorageDescriptor object should be created with the Fields that are sanity checked when trying to fetch the table. NOTE : This fix fixes 6 test cases in itests/hive-unit/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7251) Fix StorageDescriptor usage in unit tests
[ https://issues.apache.org/jira/browse/HIVE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7251: --- Assignee: Pankit Thapar Fix StorageDescriptor usage in unit tests -- Key: HIVE-7251 URL: https://issues.apache.org/jira/browse/HIVE-7251 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.1 Reporter: Pankit Thapar Assignee: Pankit Thapar Priority: Minor Attachments: HIVE-7251.patch Current Approach : StorageDescriptor class is used to describe parameters like InputFormat, Outputformat, SerDeInfo, etc. for a hive table. Some of the class variables like InputFormat, OutputFormat, SerDeInfo.serializationLib, etc. are required fields when creating a storage descriptor object. For example : createTable command in the metaStoreClient creates the table with the default values of such variables defined in HiveConf or hive-default.xml But in unit tests, table is created in a slightly different way, that these values need to be set explicitly. Thus, when creating tables in tests, required fieldes of StorageDescriptor object need to be set. Issue with current approach : From some of the current usages of this class in unit tests, I noticed that when any one of the test cases tried to clean up the database and found a table created by any of the previously executed test cases, the clean up process tries to fetch the Table object and performs the sanity checks which include checking for required fields like InputFormat, OutputFormat, SerDeInfo.serializationLib of the table. The sanity checks fail which results in failure of the test case. Fix : In unit-tests, StorageDescriptor object should be created with the Fields that are sanity checked when trying to fetch the table. NOTE : This fix fixes 6 test cases in itests/hive-unit/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7251) Fix StorageDescriptor usage in unit tests
[ https://issues.apache.org/jira/browse/HIVE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037768#comment-14037768 ] Ashutosh Chauhan commented on HIVE-7251: +1 Fix StorageDescriptor usage in unit tests -- Key: HIVE-7251 URL: https://issues.apache.org/jira/browse/HIVE-7251 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.1 Reporter: Pankit Thapar Priority: Minor Attachments: HIVE-7251.patch Current Approach : StorageDescriptor class is used to describe parameters like InputFormat, Outputformat, SerDeInfo, etc. for a hive table. Some of the class variables like InputFormat, OutputFormat, SerDeInfo.serializationLib, etc. are required fields when creating a storage descriptor object. For example : createTable command in the metaStoreClient creates the table with the default values of such variables defined in HiveConf or hive-default.xml But in unit tests, table is created in a slightly different way, that these values need to be set explicitly. Thus, when creating tables in tests, required fieldes of StorageDescriptor object need to be set. Issue with current approach : From some of the current usages of this class in unit tests, I noticed that when any one of the test cases tried to clean up the database and found a table created by any of the previously executed test cases, the clean up process tries to fetch the Table object and performs the sanity checks which include checking for required fields like InputFormat, OutputFormat, SerDeInfo.serializationLib of the table. The sanity checks fail which results in failure of the test case. Fix : In unit-tests, StorageDescriptor object should be created with the Fields that are sanity checked when trying to fetch the table. NOTE : This fix fixes 6 test cases in itests/hive-unit/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7219) Improve performance of serialization utils in ORC
[ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037771#comment-14037771 ] Prasanth J commented on HIVE-7219: -- bq. Question: Should the following information from Prasanth J also be documented, and if so does it belong in the ORC wikidoc or with the parameter description in Configuration Properties? bq. For integers, this patch will improve only very specific cases. If the encoding uses SHORT_REPEAT, DELTA (esp. fixed delta), PATCHED_BLOB then this patch will NOT have any effect, as these encodings does not use bit packing. The bit packed encodings like DIRECT, DELTA (variable delta) will see improvements. I think these are too specific for it to be put into user documentation. Improve performance of serialization utils in ORC - Key: HIVE-7219 URL: https://issues.apache.org/jira/browse/HIVE-7219 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch, HIVE-7219.4.patch, orc-read-perf-jmh-benchmark.png ORC uses serialization utils heavily for reading and writing data. The bitpacking and unpacking code in writeInts() and readInts() can be unrolled for better performance. Also double reader/writer performance can be improved by bulk reading/writing from/to byte array. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
[ https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7051: --- Component/s: Statistics Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION - Key: HIVE-7051 URL: https://issues.apache.org/jira/browse/HIVE-7051 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Ashutosh Chauhan Attachments: HIVE-7051.1.patch Same as HIVE-7050 but for partitions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
[ https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7051: --- Attachment: HIVE-7051.1.patch Patch to implement the same. Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION - Key: HIVE-7051 URL: https://issues.apache.org/jira/browse/HIVE-7051 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Attachments: HIVE-7051.1.patch Same as HIVE-7050 but for partitions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
[ https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7051: --- Status: Patch Available (was: Open) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION - Key: HIVE-7051 URL: https://issues.apache.org/jira/browse/HIVE-7051 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Ashutosh Chauhan Attachments: HIVE-7051.1.patch Same as HIVE-7050 but for partitions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
[ https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reassigned HIVE-7051: -- Assignee: Ashutosh Chauhan Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION - Key: HIVE-7051 URL: https://issues.apache.org/jira/browse/HIVE-7051 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Ashutosh Chauhan Attachments: HIVE-7051.1.patch Same as HIVE-7050 but for partitions -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 22782: Display partition level column stats
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22782/ --- Review request for hive and Gunther Hagleitner. Bugs: HIVE-7051 https://issues.apache.org/jira/browse/HIVE-7051 Repository: hive-git Description --- Display partition level column stats Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java fad5ed3 ql/src/test/queries/clientpositive/columnstats_partlvl.q 8bf6c70 ql/src/test/results/clientpositive/columnstats_partlvl.q.out a4c4677 Diff: https://reviews.apache.org/r/22782/diff/ Testing --- Added new tests Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-7255) Allow partial partition spec in analyze command
[ https://issues.apache.org/jira/browse/HIVE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037804#comment-14037804 ] Ashutosh Chauhan commented on HIVE-7255: To enable further testing of this, HIVE-7051 will be of great help. [~hagleitn] Can you take a look at that first? Allow partial partition spec in analyze command --- Key: HIVE-7255 URL: https://issues.apache.org/jira/browse/HIVE-7255 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7255.1.patch So that stats collection can happen for multiple partitions through one statement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7098) RecordUpdater should extend RecordWriter
[ https://issues.apache.org/jira/browse/HIVE-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037817#comment-14037817 ] Alan Gates commented on HIVE-7098: -- Ran the failing tests locally, all of which pass except root_dir_external_table which fails on trunk as well. So I conclude none of these are issues for this patch. RecordUpdater should extend RecordWriter Key: HIVE-7098 URL: https://issues.apache.org/jira/browse/HIVE-7098 Project: Hive Issue Type: Sub-task Components: File Formats, Transactions Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7098.patch A new interface ql.io.RecordUpdater was added as part of the ACID work in 0.13. This interface should extend RecordWriter because: # If it does not significant portions of FileSinkOperator will have to be reworked to handle both RecordWriter and RecordUpdater # Once a file format accepts transactions, it should not generally be possible to write using RecordWriter.write as that will write old style records without transaction information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037822#comment-14037822 ] Sushanth Sowmyan commented on HIVE-7223: Is there a need for a ListPartition getPartitions() when there is already a IteratorPartition being exposed? If we do a partitionspec interface such as this, I'd like to keep it as lean as possible. Also, I wonder if MapKey,Value isn't a more fundamental interface point, instead of a String partitionname, since partition names can be arbitrary, but partition key values define a partition. To wit, what do you think about a PartitionSpec interface that looks like this: {code} public interface PartitionSpec { public IteratorPartition getPartitionsIter(); public IteratorMapString,String getPartKeyValuesIter(); } {code} Thoughts? Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by
Re: Review Request 22770: Allow partial partition spec in analyze command
On June 19, 2014, 3:51 a.m., Szehon Ho wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java, line 73 https://reviews.apache.org/r/22770/diff/1/?file=612845#file612845line73 Looks like no longer used. will get rid of it. On June 19, 2014, 3:51 a.m., Szehon Ho wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java, line 422 https://reviews.apache.org/r/22770/diff/1/?file=612845#file612845line422 We can remove 'totalRows' variable as well. yup. will remove On June 19, 2014, 3:51 a.m., Szehon Ho wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java, line 425 https://reviews.apache.org/r/22770/diff/1/?file=612845#file612845line425 Do we need to cleanup the fetch operator at some point? Yeah, will do. On June 19, 2014, 3:51 a.m., Szehon Ho wrote: ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDesc.java, line 65 https://reviews.apache.org/r/22770/diff/1/?file=612849#file612849line65 Same question, cant we still display partial partition spec? not sure, if its any useful in explain, but will see if I can add it back. - Ashutosh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22770/#review46179 --- On June 19, 2014, 2:35 a.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22770/ --- (Updated June 19, 2014, 2:35 a.m.) Review request for hive. Bugs: HIVE-7255 https://issues.apache.org/jira/browse/HIVE-7255 Repository: hive-git Description --- So that stats collection can happen for multiple partitions through one statement. Diffs - ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8ae1c73 ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 47a6871 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 1270520 ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java a4ba4bd ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java b75f78c ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDesc.java 86e6877 ql/src/test/queries/clientnegative/columnstats_partlvl_dp.q b4887c4 ql/src/test/queries/clientnegative/columnstats_partlvl_incorrect_num_keys.q 2f8e927 ql/src/test/queries/clientpositive/columnstats_partlvl_dp.q PRE-CREATION ql/src/test/queries/clientpositive/columnstats_partlvl_incorrect_num_keys.q PRE-CREATION ql/src/test/results/clientnegative/columnstats_partlvl_invalid_values.q.out d48d8cb ql/src/test/results/clientpositive/columnstats_partlvl.q.out a4c4677 ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out PRE-CREATION ql/src/test/results/clientpositive/columnstats_partlvl_incorrect_num_keys.q.out PRE-CREATION Diff: https://reviews.apache.org/r/22770/diff/ Testing --- Moved -ve tests to +ve. Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037837#comment-14037837 ] Alan Gates commented on HIVE-7223: -- Seems reasonable. For backwards compatibility you will want to leave the existing thrift calls there and create new ones that handle your new union type. We can mark the old calls as deprecated and remove them after 0.14, but we should give people at least one release to move. Your iterator model would seem to imply that you could fetch just some partitions from the metastore in a thrift call, and go back for more later. Is that part of your plan? Part of a later phase? Or just an unintended side effect? Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037835#comment-14037835 ] Sushanth Sowmyan commented on HIVE-7223: Hm.. I just realized that as far as the metastore db definitions go, we define a unique key constraint on partition name, but make no such guarantee on the key-values themselves, which are not metastore entities. I don't like it, but I see. I'll have to dig a bit more into this. Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)
[ https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037843#comment-14037843 ] Gunther Hagleitner commented on HIVE-7220: -- I think we should move forward with this it will give us a working build, while we work out MAPREDUCE-5756. We have HIVE-6401 open to handle the situation when we get a fix. I've reviewed the patch, it looks good except for the isValidSplit call. Why is that needed? You prune in the constructor so presumably you never get splits containing folders. If this is just a sanity check it should probably throw an assertion if there's still paths in there. If not - it seems incorrect to throw out splits that don't match (especially since you might throw out combined valid locations with it). Empty dir in external table causes issue (root_dir_external_table.q failure) Key: HIVE-7220 URL: https://issues.apache.org/jira/browse/HIVE-7220 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7220.patch While looking at root_dir_external_table.q failure, which is doing a query on an external table located at root ('/'), I noticed that latest Hadoop2 CombineFileInputFormat returns split representing empty directories (like '/Users'), which leads to failure in Hive's CombineFileRecordReader as it tries to open the directory for processing. Tried with an external table in a normal HDFS directory, and it also returns the same error. Looks like a real bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6967) Hive transaction manager fails when SQLServer is used as an RDBMS
[ https://issues.apache.org/jira/browse/HIVE-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037842#comment-14037842 ] Alan Gates commented on HIVE-6967: -- Failed tests pass when run locally, except for root_dir_external_table which fails against trunk, so I don't think any of these failures relate to this patch. Hive transaction manager fails when SQLServer is used as an RDBMS - Key: HIVE-6967 URL: https://issues.apache.org/jira/browse/HIVE-6967 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-6967.patch When using SQLServer as an RDBMS for the metastore, any transaction or DbLockMgr operations fail with: {code} MetaException(message:Unable to select from transaction database com.microsoft.sqlserver.jdbc.SQLServerException: Line 1: FOR UPDATE clause allowed only for DECLARE CURSOR. {code} The issue is that SQLServer does not support the FOR UPDATE clause in SELECT. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
[ https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037852#comment-14037852 ] Gunther Hagleitner commented on HIVE-7051: -- +1 Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION - Key: HIVE-7051 URL: https://issues.apache.org/jira/browse/HIVE-7051 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Ashutosh Chauhan Attachments: HIVE-7051.1.patch Same as HIVE-7050 but for partitions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7255) Allow partial partition spec in analyze command
[ https://issues.apache.org/jira/browse/HIVE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037857#comment-14037857 ] Gunther Hagleitner commented on HIVE-7255: -- Sure. HIVE-7051 looks good to me. Allow partial partition spec in analyze command --- Key: HIVE-7255 URL: https://issues.apache.org/jira/browse/HIVE-7255 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7255.1.patch So that stats collection can happen for multiple partitions through one statement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7202) DbTxnManager deadlocks in hcatalog.cli.TestSematicAnalysis.testAlterTblFFpart()
[ https://issues.apache.org/jira/browse/HIVE-7202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037885#comment-14037885 ] Alan Gates commented on HIVE-7202: -- Tests pass when run locally except root_dir_external_table and authorization_ctas, both of which also fail on trunk. So I do not think these failures are related to the patch. DbTxnManager deadlocks in hcatalog.cli.TestSematicAnalysis.testAlterTblFFpart() --- Key: HIVE-7202 URL: https://issues.apache.org/jira/browse/HIVE-7202 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Alan Gates Fix For: 0.14.0 Attachments: HIVE-7202.patch select * from HIVE_LOCKS produces {noformat} 6 |1 |0 |default |junit_sem_analysis |NULL |w|r|1402354627716 |NULL|unknown |ekoifman.local 6 |2 |0 |default |junit_sem_analysis |b=2010-10-10 |w|e|1402354627716 |NULL|unknown |ekoifman.local 2 rows selected {noformat} easiest way to repro this is to add hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, true); hiveConf.setVar(HiveConf.ConfVars.HIVE_TXN_MANAGER, org.apache.hadoop.hive.ql.lockmgr.DbTxnManager); in HCatBaseTest.setUpHiveConf() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7242) alter table drop partition is acquiring the wrong type of lock
[ https://issues.apache.org/jira/browse/HIVE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7242: - Status: Open (was: Patch Available) Forgot to wait for HIVE-7202 to be checked in before I marked patch available. alter table drop partition is acquiring the wrong type of lock -- Key: HIVE-7242 URL: https://issues.apache.org/jira/browse/HIVE-7242 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.14.0 Attachments: HIVE-7242.patch Doing an alter table foo drop partition ('bar=x') acquired a shared-write lock on partition bar=x. It should be acquiring an exclusive lock in that case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7241) Wrong lock acquired for alter table rename partition
[ https://issues.apache.org/jira/browse/HIVE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7241: - Status: Open (was: Patch Available) Need to wait for HIVE-7202 before marking this one patch available. Wrong lock acquired for alter table rename partition Key: HIVE-7241 URL: https://issues.apache.org/jira/browse/HIVE-7241 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7241.patch Doing an alter table foo partition (bar='x') rename to partition (bar='y') acquires a read lock on table foo. It should instead acquire an exclusive lock on partition bar=x. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037893#comment-14037893 ] Carl Steinbach commented on HIVE-7094: -- [~sushanth]: I'm planning to commit this patch tonight. Please let me know if I should hold off. Thanks. Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch, HIVE-7094.4.patch, HIVE-7094.5.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037892#comment-14037892 ] Gunther Hagleitner commented on HIVE-7254: -- Would moving the minimr/tez/etc properties to a separate file and using http://mojo.codehaus.org/properties-maven-plugin/ be an option? Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037910#comment-14037910 ] Mithun Radhakrishnan commented on HIVE-7223: [~sushanth]: Yep, you're right. I was kinda hoping that'd be brought up in the review. It'd be trivial to construct a full ListPartition from just the iterator. [~alangates]: I agree. These ought to be parallel APIs until some point after 0.14. bq. Your iterator model would seem to imply that you could fetch just some partitions from the metastore in a thrift call, and go back for more later. Is that part of your plan? Yes, that's definitely on the cards. As per [~sershe]'s comments on HIVE-7195, it should be possible to capture cursor/transaction-like semantics, allowing us to batch the partitions returned from {{listPartitions()}}. I would like to first focus on the {{addPartitions(PartitionSpec)}} APIs for Falcon (etc.), and leave the PartitionSpec based {{listPartitions()}} API use {{DefaultPartitionSpec}}. Or maybe a {{CompressedPartitionSpec}} that shares StorageDescriptor instances. We can put in batched reads in a later JIRA, since it's so much more work. Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a
[jira] [Created] (HIVE-7258) Move qtest-Driver properties from pom to separate file
Szehon Ho created HIVE-7258: --- Summary: Move qtest-Driver properties from pom to separate file Key: HIVE-7258 URL: https://issues.apache.org/jira/browse/HIVE-7258 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Szehon Ho -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037921#comment-14037921 ] Szehon Ho commented on HIVE-7254: - Created a subtask HIVE-7258 on that suggestion. Anyone is welcome to take it. Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7254: Issue Type: Test (was: Bug) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7257) UDF format_number() does not work on FLOAT types
[ https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilbur Yang updated HIVE-7257: -- Attachment: HIVE-7257.1.patch UDF format_number() does not work on FLOAT types Key: HIVE-7257 URL: https://issues.apache.org/jira/browse/HIVE-7257 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang Attachments: HIVE-7257.1.patch #1 Show the table: hive describe ssga3; OK sourcestring test float dttimestamp Time taken: 0.243 seconds #2 Run format_number on double and it works: hive select format_number(cast(test as double),2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0009, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0009 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.47 sec MapReduce Total cumulative CPU time: 1 seconds 470 msec Ended Job = job_201403131616_0009 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 470 msec OK 1.00 2.00 Time taken: 16.563 seconds #3 Run format_number on float and it does not work hive select format_number(test,2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0010, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403131616_0010 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Examining task ID: task_201403131616_0010_m_02 (and more) from job job_201403131616_0010 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid host:port authority: logicaljt Task with the most failures(4): Task ID: task_201403131616_0010_m_00 Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141) .. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7257) UDF format_number() does not work on FLOAT types
[ https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilbur Yang updated HIVE-7257: -- Status: Patch Available (was: Open) UDF format_number() does not work on FLOAT types Key: HIVE-7257 URL: https://issues.apache.org/jira/browse/HIVE-7257 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang Attachments: HIVE-7257.1.patch #1 Show the table: hive describe ssga3; OK sourcestring test float dttimestamp Time taken: 0.243 seconds #2 Run format_number on double and it works: hive select format_number(cast(test as double),2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0009, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0009 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.47 sec MapReduce Total cumulative CPU time: 1 seconds 470 msec Ended Job = job_201403131616_0009 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 470 msec OK 1.00 2.00 Time taken: 16.563 seconds #3 Run format_number on float and it does not work hive select format_number(test,2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0010, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403131616_0010 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Examining task ID: task_201403131616_0010_m_02 (and more) from job job_201403131616_0010 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid host:port authority: logicaljt Task with the most failures(4): Task ID: task_201403131616_0010_m_00 Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141) .. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayesh updated HIVE-7100: - Attachment: (was: HIVE-7100.1.patch) Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayesh updated HIVE-7100: - Attachment: HIVE-7100.1.patch Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jayesh updated HIVE-7100: - Attachment: HIVE-7100.1.patch Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7259) Transaction manager fails when Oracle used as metastore RDBMS
Alan Gates created HIVE-7259: Summary: Transaction manager fails when Oracle used as metastore RDBMS Key: HIVE-7259 URL: https://issues.apache.org/jira/browse/HIVE-7259 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates The schema specification for the transaction tables creates NUMBER(10) columns for longs in the Oracle schema. However, these should be NUMBER(19). As it is the JDBC calls to get the value as a long fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7259) Transaction manager fails when Oracle used as metastore RDBMS
[ https://issues.apache.org/jira/browse/HIVE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-7259. -- Resolution: Duplicate Transaction manager fails when Oracle used as metastore RDBMS - Key: HIVE-7259 URL: https://issues.apache.org/jira/browse/HIVE-7259 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates The schema specification for the transaction tables creates NUMBER(10) columns for longs in the Oracle schema. However, these should be NUMBER(19). As it is the JDBC calls to get the value as a long fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7256) HiveTxnManager should be stateless
[ https://issues.apache.org/jira/browse/HIVE-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7256: - Assignee: Alan Gates (was: Eugene Koifman) HiveTxnManager should be stateless -- Key: HIVE-7256 URL: https://issues.apache.org/jira/browse/HIVE-7256 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Alan Gates In order to integrate HCat with Hive ACID, we should be able to create an instance of HiveTxnManager and use it to acquire locks, and release locks from a different instance of HiveTxnManager. One use case where this shows up is when a job using HCat is retried, since calls to TxnManager are made from the jobs OutputCommitter. Another, is HCatReader/Writer. For example, TestReaderWriter, calls setupJob() from one instance of OutputCommitterContainer and commitJob() from another instance. The 2nd case is perhaps better solved by ensuring there is only 1 instance of OutputCommitterContainer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038022#comment-14038022 ] Gunther Hagleitner commented on HIVE-7254: -- I can try to factor that info into a separate file and make our build process work with it. What do you need on the ptest framework side? Would a property file work for you? Is the file that drives the test framework checked into the hive repo? Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038028#comment-14038028 ] Gunther Hagleitner commented on HIVE-7254: -- Alternatively if we have something like http://xmltwig.org/xmltwig/tools/xml_grep/xml_grep.html on the build nodes parsing the xml wouldn't be hard either. But that seems more brittle than maintaining the tests in a separate file. Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches
[ https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038043#comment-14038043 ] Jitendra Nath Pandey commented on HIVE-7105: +1 Enable ReduceRecordProcessor to generate VectorizedRowBatches - Key: HIVE-7105 URL: https://issues.apache.org/jira/browse/HIVE-7105 Project: Hive Issue Type: Bug Components: Tez, Vectorization Reporter: Rajesh Balamohan Assignee: Gopal V Fix For: 0.14.0 Attachments: HIVE-7105.1.patch, HIVE-7105.2.patch Currently, ReduceRecordProcessor sends one key,value pair at a time to its operator pipeline. It would be beneficial to send VectorizedRowBatch to downstream operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-7188: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks to [~hsubramaniyan]! sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Reduce Operator Tree: Group By Operator
[jira] [Commented] (HIVE-5155) Support secure proxy user access to HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038076#comment-14038076 ] Mani Rajash commented on HIVE-5155: --- the bug is marked as fixed are there release notes or guide to using this feature? Support secure proxy user access to HiveServer2 --- Key: HIVE-5155 URL: https://issues.apache.org/jira/browse/HIVE-5155 Project: Hive Issue Type: Improvement Components: Authentication, HiveServer2, JDBC Affects Versions: 0.12.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.13.0 Attachments: HIVE-5155-1-nothrift.patch, HIVE-5155-noThrift.2.patch, HIVE-5155-noThrift.4.patch, HIVE-5155-noThrift.5.patch, HIVE-5155-noThrift.6.patch, HIVE-5155-noThrift.7.patch, HIVE-5155-noThrift.8.patch, HIVE-5155.1.patch, HIVE-5155.2.patch, HIVE-5155.3.patch, HIVE-5155.4.patch, HIVE-5155.5.patch, ProxyAuth.java, ProxyAuth.out, TestKERBEROS_Hive_JDBC.java The HiveServer2 can authenticate a client using via Kerberos and impersonate the connecting user with underlying secure hadoop. This becomes a gateway for a remote client to access secure hadoop cluster. Now this works fine for when the client obtains Kerberos ticket and directly connects to HiveServer2. There's another big use case for middleware tools where the end user wants to access Hive via another server. For example Oozie action or Hue submitting queries or a BI tool server accessing to HiveServer2. In these cases, the third party server doesn't have end user's Kerberos credentials and hence it can't submit queries to HiveServer2 on behalf of the end user. This ticket is for enabling proxy access to HiveServer2 for third party tools on behalf of end users. There are two parts of the solution proposed in this ticket: 1) Delegation token based connection for Oozie (OOZIE-1457) This is the common mechanism for Hadoop ecosystem components. Hive Remote Metastore and HCatalog already support this. This is suitable for tool like Oozie that submits the MR jobs as actions on behalf of its client. Oozie already uses similar mechanism for Metastore/HCatalog access. 2) Direct proxy access for privileged hadoop users The delegation token implementation can be a challenge for non-hadoop (especially non-java) components. This second part enables a privileged user to directly specify an alternate session user during the connection. If the connecting user has hadoop level privilege to impersonate the requested userid, then HiveServer2 will run the session as that requested user. For example, user Hue is allowed to impersonate user Bob (via core-site.xml proxy user configuration). Then user Hue can connect to HiveServer2 and specify Bob as session user via a session property. HiveServer2 will verify Hue's proxy user privilege and then impersonate user Bob instead of Hue. This will enable any third party tool to impersonate alternate userid without having to implement delegation token connection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7260) between operator for vectorization should support non-constant expressions as inputs
Hari Sankar Sivarama Subramaniyan created HIVE-7260: --- Summary: between operator for vectorization should support non-constant expressions as inputs Key: HIVE-7260 URL: https://issues.apache.org/jira/browse/HIVE-7260 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Follow-up jira for HIVE-7166. Eg query where vectorization is disabled: select x from T where T.y between T.a and T.d; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038081#comment-14038081 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-7166: - Create HIVE-7260 as follow up jira. Thanks Hari Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ | 1 | | 2 | | 3 | ++ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ ++ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7234) Select on decimal column throws NPE
[ https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038085#comment-14038085 ] Hive QA commented on HIVE-7234: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651316/HIVE-7234.3.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5640 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/518/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/518/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-518/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651316 Select on decimal column throws NPE --- Key: HIVE-7234 URL: https://issues.apache.org/jira/browse/HIVE-7234 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7234.2.patch, HIVE-7234.3.patch, HIVE-7234.patch Select on decimal column throws NPE for values greater than maximum permissible value (99) Steps to repro: DROP TABLE IF EXISTS DECIMAL; CREATE TABLE DECIMAL (dec decimal); // Content of decimal_10_0.txt = 99.999 LOAD DATA LOCAL INPATH '../../data/files/decimal_10_0.txt' OVERWRITE INTO TABLE DECIMAL; SELECT dec FROM DECIMAL; = throws NPE DROP TABLE DECIMAL; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038086#comment-14038086 ] Szehon Ho commented on HIVE-7254: - Thanks, property file would work fine for Ptest framework. The refactoring of normal build process can be done as a first change. The PTest framework today generates cmds like mvn test -Dtest=TestMiniTezCliDriver -Dqfile=$batch1 using its own property file, and will continue to work. The actual Ptest framework class is checked into the hive repo at: [TestParser.java|https://github.com/apache/hive/blob/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestParser.java]. It runs against the Ptest conf file trunk-mr2.properties (attached). As a second step, we can replace the hard-coded test names there with references to the new property file, and have TestParser read those. I can take a look at this, but if anyone is interested, build instructions for Ptest framework are at : [README|https://github.com/apache/hive/blob/trunk/testutils/ptest2/README.md]. In development, I usually run TestParser locally against the file to verify the generated-batches, for example see [TestTestParser.java|https://github.com/apache/hive/blob/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestTestParser.java]. Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
[ https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7254: Attachment: trunk-mr2.properties Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test --- Key: HIVE-7254 URL: https://issues.apache.org/jira/browse/HIVE-7254 Project: Hive Issue Type: Test Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Attachments: trunk-mr2.properties Today, the Hive PTest infrastructure has a test-driver configuration called directory, so it will run all the qfiles under that directory for that driver. For example, CLIDriver is configured with directory ql/src/test/queries/clientpositive However the configuration for the miniXXXDrivers (miniMRDriver, miniMRDriverNegative, miniTezDriver) run only a select number of tests under directory. So we have to use the include configuration to hard-code a list of tests for it to run. This is duplicating the list of each miniDriver's tests already in the /itests/qtest pom file, and can get out of date. It would be nice if both got their information the same way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7249) HiveTxnManager.closeTxnManger() throws if called after commitTxn()
[ https://issues.apache.org/jira/browse/HIVE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038092#comment-14038092 ] Eugene Koifman commented on HIVE-7249: -- [~alangates] org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned gets wedged with this patch HiveTxnManager.closeTxnManger() throws if called after commitTxn() -- Key: HIVE-7249 URL: https://issues.apache.org/jira/browse/HIVE-7249 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Alan Gates Attachments: HIVE-7249.patch I openTxn() and acquireLocks() for a query that looks like INSERT INTO T PARTITION(p) SELECT * FROM T. Then I call commitTxn(). Then I call closeTxnManger() I get an exception saying lock not found (the only lock in this txn). So it seems TxnMgr doesn't know that commit released the locks. Here is the stack trace and some log output which maybe useful: {noformat} 2014-06-17 15:54:40,771 DEBUG mapreduce.TransactionContext (TransactionContext.java:onCommitJob(128)) - onCommitJob(job_local557130041_0001). this=46719652 2014-06-17 15:54:40,771 DEBUG lockmgr.DbTxnManager (DbTxnManager.java:commitTxn(205)) - Committing txn 1 2014-06-17 15:54:40,771 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) - Going to execute query values current_timestamp 2014-06-17 15:54:40,772 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1423)) - Going to execute query select txn_state from TXNS where txn_id = 1 for\ update 2014-06-17 15:54:40,773 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1438)) - Going to execute update update TXNS set txn_last_heartbeat = 140304568\ 0772 where txn_id = 1 2014-06-17 15:54:40,778 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatTxn(1440)) - Going to commit 2014-06-17 15:54:40,779 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(344)) - Going to execute insert insert into COMPLETED_TXN_COMPONENTS select tc_txn\ id, tc_database, tc_table, tc_partition from TXN_COMPONENTS where tc_txnid = 1 2014-06-17 15:54:40,784 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(352)) - Going to execute update delete from TXN_COMPONENTS where tc_txnid = 1 2014-06-17 15:54:40,788 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(356)) - Going to execute update delete from HIVE_LOCKS where hl_txnid = 1 2014-06-17 15:54:40,791 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(359)) - Going to execute update delete from TXNS where txn_id = 1 2014-06-17 15:54:40,794 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(361)) - Going to commit 2014-06-17 15:54:40,795 WARN mapreduce.TransactionContext (TransactionContext.java:cleanup(317)) - cleanupJob(JobID=job_local557130041_0001)this=46719652 2014-06-17 15:54:40,795 DEBUG lockmgr.DbLockManager (DbLockManager.java:unlock(109)) - Unlocking id:1 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) - Going to execute query values current_timestamp 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatLock(1402)) - Going to execute update update HIVE_LOCKS set hl_last_heartbeat = 140\ 3045680796 where hl_lock_ext_id = 1 2014-06-17 15:54:40,800 DEBUG txn.TxnHandler (TxnHandler.java:heartbeatLock(1405)) - Going to rollback 2014-06-17 15:54:40,804 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - NoSuchLockException(message:No such lock: 1) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1407) at org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:477) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:4817) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy14.unlock(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1598) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:110) at org.apache.hadoop.hive.ql.lockmgr.DbLockManager.close(DbLockManager.java:162) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.destruct(DbTxnManager.java:300) at org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.closeTxnManager(HiveTxnManagerImpl.java:39) at
[jira] [Commented] (HIVE-7234) Select on decimal column throws NPE
[ https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038111#comment-14038111 ] Ashish Kumar Singh commented on HIVE-7234: -- The test errors do not look related to the patch. Select on decimal column throws NPE --- Key: HIVE-7234 URL: https://issues.apache.org/jira/browse/HIVE-7234 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7234.2.patch, HIVE-7234.3.patch, HIVE-7234.patch Select on decimal column throws NPE for values greater than maximum permissible value (99) Steps to repro: DROP TABLE IF EXISTS DECIMAL; CREATE TABLE DECIMAL (dec decimal); // Content of decimal_10_0.txt = 99.999 LOAD DATA LOCAL INPATH '../../data/files/decimal_10_0.txt' OVERWRITE INTO TABLE DECIMAL; SELECT dec FROM DECIMAL; = throws NPE DROP TABLE DECIMAL; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7249) HiveTxnManager.closeTxnManger() throws if called after commitTxn()
[ https://issues.apache.org/jira/browse/HIVE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038119#comment-14038119 ] Eugene Koifman commented on HIVE-7249: -- Here is the thread dump though there doesn't appear to be anything interesting in it {noformat} Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true -Dapple.awt.UIElement=true 57554 87066 /Users/ekoifman/dev/hive/hcatalog/core/target/surefire/surefirebooter3727332902234772866.jar 87243 sun.tools.jps.Jps 87056 org.codehaus.plexus.classworlds.launcher.Launcher ekoifman:hcatalog ekoifman$ jstack 87066 Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true -Dapple.awt.UIElement=true 2014-06-19 16:38:27 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.51-b01-457 mixed mode): Attach Listener daemon prio=9 tid=7ffded8c7800 nid=0x10c84 waiting on condition [] java.lang.Thread.State: RUNNABLE BoneCP-pool-watch-thread daemon prio=5 tid=7ffde9e89000 nid=0x10defb000 waiting on condition [10defa000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 7b8e93d10 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:322) at com.jolbox.bonecp.PoolWatchThread.run(PoolWatchThread.java:75) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) BoneCP-keep-alive-scheduler daemon prio=5 tid=7ffde9e88000 nid=0x10ddf8000 waiting on condition [10ddf7000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 7b8fde4d8 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917) at java.lang.Thread.run(Thread.java:680) com.google.common.base.internal.Finalizer daemon prio=5 tid=7ffde9e9a000 nid=0x10dcf5000 in Object.wait() [10dcf4000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7b906a3a8 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked 7b906a3a8 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at com.google.common.base.internal.Finalizer.run(Finalizer.java:127) BoneCP-pool-watch-thread daemon prio=5 tid=7ffde91c6800 nid=0x10d068000 waiting on condition [10d067000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 7b870b118 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:322) at com.jolbox.bonecp.PoolWatchThread.run(PoolWatchThread.java:75) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) BoneCP-keep-alive-scheduler daemon prio=5 tid=7ffdec031800 nid=0x10cf65000 waiting on condition [10cf64000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 7b86fd7c0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at
[jira] [Updated] (HIVE-6207) Integrate HCatalog with locking
[ https://issues.apache.org/jira/browse/HIVE-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6207: - Attachment: HIVE-6207.4.patch preliminary patch Integrate HCatalog with locking --- Key: HIVE-6207 URL: https://issues.apache.org/jira/browse/HIVE-6207 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-6207.4.patch HCatalog currently ignores any locks created by Hive users. It should respect the locks Hive creates as well as create locks itself when locking is configured. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22772: HIVE-6637: UDF in_file() doesn't take CHAR or VARCHAR as input
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22772/#review46246 --- ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInFile.java https://reviews.apache.org/r/22772/#comment81557 Rather than having to compare the actual class/class name of the type, use call PrimitiveObjectInpsector.getPrimitiveCategory() which returns an enum corresponding to the type. Take a look at GenericUDFPrintf, where Xuefu made similar changes to the printf() function to support char/varchar. - Jason Dere On June 19, 2014, 6:55 p.m., Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22772/ --- (Updated June 19, 2014, 6:55 p.m.) Review request for hive. Bugs: HIVE-6637 https://issues.apache.org/jira/browse/HIVE-6637 Repository: hive-git Description --- HIVE-6637: UDF in_file() doesn't take CHAR or VARCHAR as input Diffs - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInFile.java ea52537d0b85191f0b633a29aa3f7ddb556c288d ql/src/test/queries/clientpositive/udf_in_file.q 9d9efe8e23d6e73429ee5cd2c8470359ba2b3498 ql/src/test/results/clientpositive/udf_in_file.q.out b63143760d80f3f6a8ba0a23c0d87e8bb86fce66 Diff: https://reviews.apache.org/r/22772/diff/ Testing --- Tested with qtest. Thanks, Ashish Singh
[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case
[ https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038234#comment-14038234 ] Hive QA commented on HIVE-7063: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651239/HIVE-7063.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5656 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-519/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651239 Optimize for the Top N within a Group use case -- Key: HIVE-7063 URL: https://issues.apache.org/jira/browse/HIVE-7063 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch It is common to rank within a Group/Partition and then only return the Top N entries within each Group. With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7231) Improve ORC padding
[ https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7231: - Attachment: HIVE-7231.3.patch Improve ORC padding --- Key: HIVE-7231 URL: https://issues.apache.org/jira/browse/HIVE-7231 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch Current ORC padding is not optimal because of fixed stripe sizes within block. The padding overhead will be significant in some cases. Also padding percentage relative to stripe size is not configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC
[ https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7250: - Fix Version/s: 0.14.0 Adaptive compression buffer size for wide tables in ORC --- Key: HIVE-7250 URL: https://issues.apache.org/jira/browse/HIVE-7250 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.14.0 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, HIVE-7250.4.patch, HIVE-7250.5.patch If the input table is wide (in the order of 1000s), ORC compression buffer size overhead becomes significant causing OOM issues. To overcome this issue, buffer size should be adaptively chosen based on the available memory and the number of columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC
[ https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038286#comment-14038286 ] Prasanth J commented on HIVE-7250: -- Patch committed to trunk. Adaptive compression buffer size for wide tables in ORC --- Key: HIVE-7250 URL: https://issues.apache.org/jira/browse/HIVE-7250 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.14.0 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, HIVE-7250.4.patch, HIVE-7250.5.patch If the input table is wide (in the order of 1000s), ORC compression buffer size overhead becomes significant causing OOM issues. To overcome this issue, buffer size should be adaptively chosen based on the available memory and the number of columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC
[ https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038284#comment-14038284 ] Prasanth J commented on HIVE-7250: -- The recent patch does not change the outcome of unit tests. Adaptive compression buffer size for wide tables in ORC --- Key: HIVE-7250 URL: https://issues.apache.org/jira/browse/HIVE-7250 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.14.0 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, HIVE-7250.4.patch, HIVE-7250.5.patch If the input table is wide (in the order of 1000s), ORC compression buffer size overhead becomes significant causing OOM issues. To overcome this issue, buffer size should be adaptively chosen based on the available memory and the number of columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6394: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you for the contribution! I have committed this to trunk. Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Fix For: 0.14.0 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.7.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize
Matt McCline created HIVE-7262: -- Summary: Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize Key: HIVE-7262 URL: https://issues.apache.org/jira/browse/HIVE-7262 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline In ptf.q, create the part table with STORED AS ORC and SET hive.vectorized.execution.enabled=true; Queries fail to find BLOCKOFFSET virtual column during vectorization and suffers an exception. ERROR vector.VectorizationContext (VectorizationContext.java:getInputColumnIndex(186)) - The column BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map. Jitendra pointed to the routine that returns the VectorizationContext in Vectorize.java needing to add virtual columns to the map, too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038405#comment-14038405 ] Brock Noland commented on HIVE-7230: [~davidzchen] thank you very much for the contribution! This LGTM... I think we should change to things before commit: 1) add apache license to the formatter file 2) move it into a new dir dev-support Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7029) Vectorize ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038450#comment-14038450 ] Hive QA commented on HIVE-7029: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651238/HIVE-7029.2.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 5639 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/520/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/520/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-520/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12651238 Vectorize ReduceWork Key: HIVE-7029 URL: https://issues.apache.org/jira/browse/HIVE-7029 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch This will enable vectorization team to independently work on vectorization on reduce side even before vectorized shuffle is ready. NOTE: Tez only (i.e. TezTask only) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7234) Select on decimal column throws NPE
[ https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038456#comment-14038456 ] Ashish Kumar Singh commented on HIVE-7234: -- Thanks [~xuefuz] and [~swarnim] for reviewing. Select on decimal column throws NPE --- Key: HIVE-7234 URL: https://issues.apache.org/jira/browse/HIVE-7234 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7234.2.patch, HIVE-7234.3.patch, HIVE-7234.patch Select on decimal column throws NPE for values greater than maximum permissible value (99) Steps to repro: DROP TABLE IF EXISTS DECIMAL; CREATE TABLE DECIMAL (dec decimal); // Content of decimal_10_0.txt = 99.999 LOAD DATA LOCAL INPATH '../../data/files/decimal_10_0.txt' OVERWRITE INTO TABLE DECIMAL; SELECT dec FROM DECIMAL; = throws NPE DROP TABLE DECIMAL; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Chen updated HIVE-7230: - Attachment: HIVE-7230.3.patch Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch, HIVE-7230.3.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038472#comment-14038472 ] David Chen commented on HIVE-7230: -- Thanks, [~brocknoland]! I have added the license header to the formatter file and moved into a new dev-support directory. I have attached a new patch and updated RB. Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch, HIVE-7230.3.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
branch for cbo work
Hi all, Some of you may have noticed that cost based optimizer work is going on at HIVE-5775 John has put up an initial patch there as well. But there is a lot more work that needs to be done. Following our tradition of large feature work in branch, I propose that we create a branch and commit this patch in it and than continue to work on it in branch to improve it. Hopefully, we can get it in shape so that we can merge it in trunk once its ready. Unless, I hear otherwise I plan to create a branch and commit this initial patch by early next week. Design doc is located here : https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive Thanks, Ashutosh
[jira] [Created] (HIVE-7263) Missing fixes from review of parquet-timestamp
Szehon Ho created HIVE-7263: --- Summary: Missing fixes from review of parquet-timestamp Key: HIVE-7263 URL: https://issues.apache.org/jira/browse/HIVE-7263 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7251) Fix StorageDescriptor usage in unit tests
[ https://issues.apache.org/jira/browse/HIVE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7251: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Pankit! Fix StorageDescriptor usage in unit tests -- Key: HIVE-7251 URL: https://issues.apache.org/jira/browse/HIVE-7251 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.1 Reporter: Pankit Thapar Assignee: Pankit Thapar Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7251.patch Current Approach : StorageDescriptor class is used to describe parameters like InputFormat, Outputformat, SerDeInfo, etc. for a hive table. Some of the class variables like InputFormat, OutputFormat, SerDeInfo.serializationLib, etc. are required fields when creating a storage descriptor object. For example : createTable command in the metaStoreClient creates the table with the default values of such variables defined in HiveConf or hive-default.xml But in unit tests, table is created in a slightly different way, that these values need to be set explicitly. Thus, when creating tables in tests, required fieldes of StorageDescriptor object need to be set. Issue with current approach : From some of the current usages of this class in unit tests, I noticed that when any one of the test cases tried to clean up the database and found a table created by any of the previously executed test cases, the clean up process tries to fetch the Table object and performs the sanity checks which include checking for required fields like InputFormat, OutputFormat, SerDeInfo.serializationLib of the table. The sanity checks fail which results in failure of the test case. Fix : In unit-tests, StorageDescriptor object should be created with the Fields that are sanity checked when trying to fetch the table. NOTE : This fix fixes 6 test cases in itests/hive-unit/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038491#comment-14038491 ] Brock Noland commented on HIVE-7230: +1 pending tests Thank you!! Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch, HIVE-7230.3.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 22804: HIVE-7263 - Missing fixes from review of parquet-timestamp
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22804/ --- Review request for hive and Brock Noland. Repository: hive-git Description --- This is for HIVE-6394 (parquet timestamp). There had been a review comment for not relying on example parquet classes, which are just suggestion of how to implement timestamp. It is trivial, just to implement that sample class in Hive code base. I had addressed it in one of the patch, but the next patch did not carry that on as I made a mistake. Addressing again now. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 73cf0f5 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java 06987ad ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 8bb9cb1 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java f56a643 Diff: https://reviews.apache.org/r/22804/diff/ Testing --- Ran affected parquet timestamp tests. Thanks, Szehon Ho