[jira] [Created] (HIVE-6020) Support UUIDs and versioning for DBs/Tables/Partitions/Columns
Carl Steinbach created HIVE-6020: Summary: Support UUIDs and versioning for DBs/Tables/Partitions/Columns Key: HIVE-6020 URL: https://issues.apache.org/jira/browse/HIVE-6020 Project: Hive Issue Type: Bug Components: Database/Schema, Metastore Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6018) FetchTask should not reference metastore classes
[ https://issues.apache.org/jira/browse/HIVE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846152#comment-13846152 ] Hive QA commented on HIVE-6018: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618358/HIVE-6018.1.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4763 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/621/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/621/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12618358 FetchTask should not reference metastore classes Key: HIVE-6018 URL: https://issues.apache.org/jira/browse/HIVE-6018 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-6018.1.patch.txt The below code parts in PartitionDesc throws NoClassDefFounError sometimes in execution. {noformat} public Deserializer getDeserializer() { try { return MetaStoreUtils.getDeserializer(Hive.get().getConf(), getProperties()); } catch (Exception e) { return null; } } {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification
[ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846191#comment-13846191 ] Hive QA commented on HIVE-1466: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618376/HIVE-1466.1.patch {color:green}SUCCESS:{color} +1 4765 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/622/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/622/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618376 Add NULL DEFINED AS to ROW FORMAT specification --- Key: HIVE-1466 URL: https://issues.apache.org/jira/browse/HIVE-1466 Project: Hive Issue Type: Improvement Reporter: Adam Kramer Assignee: Prasad Mujumdar Attachments: HIVE-1466.1.patch NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5973) SMB joins produce incorrect results with multiple partitions and buckets
[ https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5973: - Attachment: HIVE-5973.1.patch Attached is the test and a fix. The problem occurs when the small table is bucketed and partitioned and has a select sub-query. The select operator that is introduced as part of the sub-query causes the issue described. Thanks to [~rhbutani] for helping with the solution and test case. It looks like the right way to run these type of tests is via the MinimrCliDriver as the CliDriver tests mask the issue by having a single reducer resulting in incorrect bucketing. SMB joins produce incorrect results with multiple partitions and buckets Key: HIVE-5973 URL: https://issues.apache.org/jira/browse/HIVE-5973 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.13.0 Attachments: HIVE-5973.1.patch It looks like there is an issue with re-using the output object array in the select operator. When we read rows of the non-big tables, we hold on to the output object in the priority queue. This causes hive to produce incorrect results because all the elements in the priority queue refer to the same object and the join happens on only one of the buckets. {noformat} output[i] = eval[i].evaluate(row); {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5973) SMB joins produce incorrect results with multiple partitions and buckets
[ https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5973: - Status: Patch Available (was: Open) SMB joins produce incorrect results with multiple partitions and buckets Key: HIVE-5973 URL: https://issues.apache.org/jira/browse/HIVE-5973 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.13.0 Attachments: HIVE-5973.1.patch It looks like there is an issue with re-using the output object array in the select operator. When we read rows of the non-big tables, we hold on to the output object in the priority queue. This causes hive to produce incorrect results because all the elements in the priority queue refer to the same object and the join happens on only one of the buckets. {noformat} output[i] = eval[i].evaluate(row); {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Review Request 16213: HIVE-5973: SMB joins produce incorrect results with multiple partitions and buckets
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16213/ --- Review request for hive, Navis Ryu and Harish Butani. Bugs: HIVE-5973 https://issues.apache.org/jira/browse/HIVE-5973 Repository: hive-git Description --- SMB joins produce incorrect results with multiple partitions and buckets Diffs - itests/hive-unit/pom.xml dae4e50 itests/qtest/pom.xml 8c249a0 ql/src/java/org/apache/hadoop/hive/ql/exec/DummyStoreOperator.java acdb040 ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q PRE-CREATION ql/src/test/results/clientpositive/auto_sortmerge_join_16.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16213/diff/ Testing --- New test case added. Thanks, Vikram Dixit Kumaraswamy
[jira] [Commented] (HIVE-5973) SMB joins produce incorrect results with multiple partitions and buckets
[ https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846211#comment-13846211 ] Vikram Dixit K commented on HIVE-5973: -- https://reviews.apache.org/r/16213/ SMB joins produce incorrect results with multiple partitions and buckets Key: HIVE-5973 URL: https://issues.apache.org/jira/browse/HIVE-5973 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.13.0 Attachments: HIVE-5973.1.patch It looks like there is an issue with re-using the output object array in the select operator. When we read rows of the non-big tables, we hold on to the output object in the priority queue. This causes hive to produce incorrect results because all the elements in the priority queue refer to the same object and the join happens on only one of the buckets. {noformat} output[i] = eval[i].evaluate(row); {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5973) SMB joins produce incorrect results with multiple partitions and buckets
[ https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846235#comment-13846235 ] Hive QA commented on HIVE-5973: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618384/HIVE-5973.1.patch {color:green}SUCCESS:{color} +1 4763 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/623/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/623/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618384 SMB joins produce incorrect results with multiple partitions and buckets Key: HIVE-5973 URL: https://issues.apache.org/jira/browse/HIVE-5973 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.13.0 Attachments: HIVE-5973.1.patch It looks like there is an issue with re-using the output object array in the select operator. When we read rows of the non-big tables, we hold on to the output object in the priority queue. This causes hive to produce incorrect results because all the elements in the priority queue refer to the same object and the join happens on only one of the buckets. {noformat} output[i] = eval[i].evaluate(row); {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations
Sun Rui created HIVE-6021: - Summary: Problem in GroupByOperator for handling distinct aggrgations Key: HIVE-6021 URL: https://issues.apache.org/jira/browse/HIVE-6021 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sun Rui Assignee: Sun Rui Use the following test case with HIVE 0.12: {code:sql} create table src(key int, value string); load data local inpath 'src/data/files/kv1.txt' overwrite into table src; set hive.map.aggr=false; select count(key),count(distinct value) from src group by key; {code} We will get an ArrayIndexOutOfBoundsException from GroupByOperator: {code} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159) ... 10 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152) ... 10 more {code} explain select count(key),count(distinct value) from src group by key; {code} STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: src TableScan alias: src Select Operator expressions: expr: key type: int expr: value type: string outputColumnNames: key, value Reduce Output Operator key expressions: expr: key type: int expr: value type: string sort order: ++ Map-reduce partition columns: expr: key type: int tag: -1 Reduce Operator Tree: Group By Operator aggregations: expr: count(KEY._col0) // The parameter causes this problem ^^^ expr: count(DISTINCT KEY._col1:0._col0) bucketGroup: false keys: expr: KEY._col0 type: int mode: complete outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {code} The root cause is within GroupByOperator.initializeOp(). The method forgets to handle the case: For a query has distinct aggregations, there is an aggregation function has a parameter which is a groupby key column but not distinct key column. {code} if (unionExprEval != null) { String[] names = parameters.get(j).getExprString().split(\\.); // parameters of the form : KEY.colx:t.coly if (Utilities.ReduceField.KEY.name().equals(names[0])) { String name = names[names.length - 2]; int tag = Integer.parseInt(name.split(\\:)[1]); ... } else { // will be VALUE._COLx if (!nonDistinctAggrs.contains(i)) { nonDistinctAggrs.add(i); } } {code} -- This message was sent by
[jira] [Updated] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations
[ https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Rui updated HIVE-6021: -- Attachment: HIVE-6021.1.patch Problem in GroupByOperator for handling distinct aggrgations Key: HIVE-6021 URL: https://issues.apache.org/jira/browse/HIVE-6021 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-6021.1.patch Use the following test case with HIVE 0.12: {code:sql} create table src(key int, value string); load data local inpath 'src/data/files/kv1.txt' overwrite into table src; set hive.map.aggr=false; select count(key),count(distinct value) from src group by key; {code} We will get an ArrayIndexOutOfBoundsException from GroupByOperator: {code} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159) ... 10 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152) ... 10 more {code} explain select count(key),count(distinct value) from src group by key; {code} STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: src TableScan alias: src Select Operator expressions: expr: key type: int expr: value type: string outputColumnNames: key, value Reduce Output Operator key expressions: expr: key type: int expr: value type: string sort order: ++ Map-reduce partition columns: expr: key type: int tag: -1 Reduce Operator Tree: Group By Operator aggregations: expr: count(KEY._col0) // The parameter causes this problem ^^^ expr: count(DISTINCT KEY._col1:0._col0) bucketGroup: false keys: expr: KEY._col0 type: int mode: complete outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {code} The root cause is within GroupByOperator.initializeOp(). The method forgets to handle the case: For a query has distinct aggregations, there is an aggregation function has a parameter which is a groupby key column but not distinct key column. {code} if (unionExprEval != null) { String[] names = parameters.get(j).getExprString().split(\\.); // parameters of the form : KEY.colx:t.coly if (Utilities.ReduceField.KEY.name().equals(names[0])) { String name = names[names.length - 2];
[jira] [Created] (HIVE-6022) Load statements with incorrect order of partitions put input files to unreadable places
Teruyoshi Zenmyo created HIVE-6022: -- Summary: Load statements with incorrect order of partitions put input files to unreadable places Key: HIVE-6022 URL: https://issues.apache.org/jira/browse/HIVE-6022 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Teruyoshi Zenmyo Load statements with incorrect order of partitions put input files to incorrect paths. {code} CREATE TABLE test_parts (c1 string, c2 int) PARTITIONED BY (p1 string,p2 string); LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE test_parts PARTITION (p2='p1', p1='p2') {code} The input file is located as below and the data is not readable. {code} % find /user/hive/warehouse/test_parts/ /user/hive/warehouse/test_parts/ /user/hive/warehouse/test_parts//p1=p2 /user/hive/warehouse/test_parts//p1=p2/p2=p1 /user/hive/warehouse/test_parts//p2=p1 /user/hive/warehouse/test_parts//p2=p1/p1=p2 /user/hive/warehouse/test_parts//p2=p1/p1=p2/.kv1.txt.crc /user/hive/warehouse/test_parts//p2=p1/p1=p2/kv1.txt {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6022) Load statements with incorrect order of partitions put input files to unreadable places
[ https://issues.apache.org/jira/browse/HIVE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teruyoshi Zenmyo updated HIVE-6022: --- Attachment: HIVE-6022.1.patch.txt Load statements with incorrect order of partitions put input files to unreadable places --- Key: HIVE-6022 URL: https://issues.apache.org/jira/browse/HIVE-6022 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Teruyoshi Zenmyo Attachments: HIVE-6022.1.patch.txt Load statements with incorrect order of partitions put input files to incorrect paths. {code} CREATE TABLE test_parts (c1 string, c2 int) PARTITIONED BY (p1 string,p2 string); LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE test_parts PARTITION (p2='p1', p1='p2') {code} The input file is located as below and the data is not readable. {code} % find /user/hive/warehouse/test_parts/ /user/hive/warehouse/test_parts/ /user/hive/warehouse/test_parts//p1=p2 /user/hive/warehouse/test_parts//p1=p2/p2=p1 /user/hive/warehouse/test_parts//p2=p1 /user/hive/warehouse/test_parts//p2=p1/p1=p2 /user/hive/warehouse/test_parts//p2=p1/p1=p2/.kv1.txt.crc /user/hive/warehouse/test_parts//p2=p1/p1=p2/kv1.txt {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6022) Load statements with incorrect order of partitions put input files to unreadable places
[ https://issues.apache.org/jira/browse/HIVE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846397#comment-13846397 ] Xuefu Zhang commented on HIVE-6022: --- [~tzenmyo] Thanks for your contribution. Could you please put a review board entry here? Load statements with incorrect order of partitions put input files to unreadable places --- Key: HIVE-6022 URL: https://issues.apache.org/jira/browse/HIVE-6022 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Teruyoshi Zenmyo Attachments: HIVE-6022.1.patch.txt Load statements with incorrect order of partitions put input files to incorrect paths. {code} CREATE TABLE test_parts (c1 string, c2 int) PARTITIONED BY (p1 string,p2 string); LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE test_parts PARTITION (p2='p1', p1='p2') {code} The input file is located as below and the data is not readable. {code} % find /user/hive/warehouse/test_parts/ /user/hive/warehouse/test_parts/ /user/hive/warehouse/test_parts//p1=p2 /user/hive/warehouse/test_parts//p1=p2/p2=p1 /user/hive/warehouse/test_parts//p2=p1 /user/hive/warehouse/test_parts//p2=p1/p1=p2 /user/hive/warehouse/test_parts//p2=p1/p1=p2/.kv1.txt.crc /user/hive/warehouse/test_parts//p2=p1/p1=p2/kv1.txt {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations
[ https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846400#comment-13846400 ] Xuefu Zhang commented on HIVE-6021: --- [~sunrui] Thanks for your contribution. Do you mind providing the following? 1. A test case similar to what you constructed to produce the problem? 2. A review board entry. Problem in GroupByOperator for handling distinct aggrgations Key: HIVE-6021 URL: https://issues.apache.org/jira/browse/HIVE-6021 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-6021.1.patch Use the following test case with HIVE 0.12: {code:sql} create table src(key int, value string); load data local inpath 'src/data/files/kv1.txt' overwrite into table src; set hive.map.aggr=false; select count(key),count(distinct value) from src group by key; {code} We will get an ArrayIndexOutOfBoundsException from GroupByOperator: {code} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159) ... 10 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152) ... 10 more {code} explain select count(key),count(distinct value) from src group by key; {code} STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: src TableScan alias: src Select Operator expressions: expr: key type: int expr: value type: string outputColumnNames: key, value Reduce Output Operator key expressions: expr: key type: int expr: value type: string sort order: ++ Map-reduce partition columns: expr: key type: int tag: -1 Reduce Operator Tree: Group By Operator aggregations: expr: count(KEY._col0) // The parameter causes this problem ^^^ expr: count(DISTINCT KEY._col1:0._col0) bucketGroup: false keys: expr: KEY._col0 type: int mode: complete outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {code} The root cause is within GroupByOperator.initializeOp(). The method forgets to handle the case: For a query has distinct aggregations, there is an aggregation function has a parameter which is a groupby key column but not distinct key column. {code} if (unionExprEval != null) { String[] names =
[jira] [Commented] (HIVE-5824) Support generation of html test reports in maven
[ https://issues.apache.org/jira/browse/HIVE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846414#comment-13846414 ] Ashutosh Chauhan commented on HIVE-5824: I am able to generate test-report even with out this patch using commands above. Seems patch is no longer required. [~prasanth_j] Can you close this out if thats indeed the case? Support generation of html test reports in maven Key: HIVE-5824 URL: https://issues.apache.org/jira/browse/HIVE-5824 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Labels: build, maven, test Attachments: HIVE-5824.2.patch.txt, HIVE-5824.patch.txt {code}ant testreport{code} generated output of test results in html format. It is useful to support the same in maven. The default test report generated by maven is in XML format which is hard to read. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6023) Numeric Data Type Support
Deepak Raj created HIVE-6023: Summary: Numeric Data Type Support Key: HIVE-6023 URL: https://issues.apache.org/jira/browse/HIVE-6023 Project: Hive Issue Type: Improvement Components: Database/Schema, File Formats Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0 Environment: Hive 0.90, Linux, Hadoop, Data Type Extension Reporter: Deepak Raj Many companies are rethinking their strategies to adapt Hive into their ETL just for the reason that it does not support the most basic data types like Numeric(a,b). I believe there should be an improvement with upcoming versions of hive. Can we extend the hive with custom UDF for Numeric(a,b) data type? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6023) Numeric Data Type Support
[ https://issues.apache.org/jira/browse/HIVE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846477#comment-13846477 ] Eric Hanson commented on HIVE-6023: --- This'd be a nice addition. Also, the code in the Hive trunk now has support for decimal(p, s) which is functionally equivalent to numeric(p, s). Numeric Data Type Support - Key: HIVE-6023 URL: https://issues.apache.org/jira/browse/HIVE-6023 Project: Hive Issue Type: Improvement Components: Database/Schema, File Formats Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0 Environment: Hive 0.90, Linux, Hadoop, Data Type Extension Reporter: Deepak Raj Labels: Data, Hive, Numeric, Type1 Original Estimate: 2h Remaining Estimate: 2h Many companies are rethinking their strategies to adapt Hive into their ETL just for the reason that it does not support the most basic data types like Numeric(a,b). I believe there should be an improvement with upcoming versions of hive. Can we extend the hive with custom UDF for Numeric(a,b) data type? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
Ashutosh Chauhan created HIVE-6024: -- Summary: Load data local inpath unnecessarily creates a copy task Key: HIVE-6024 URL: https://issues.apache.org/jira/browse/HIVE-6024 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ashutosh Chauhan Load data command creates an additional copy task only when its loading from {{local}} It doesn't create this additional copy task while loading from DFS though. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5555) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846481#comment-13846481 ] Matt Tucker commented on HIVE-: --- Does this also allow for SQL-89 style joins? {noformat} explain select * from part p1, part p2, part p3 where p1.p_name = p2.p_name and p2.p_name = p3.p_name; {noformat} Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Attachments: AlternativeJoinSyntax.pdf Certain tools still generate `old style' Join queries where the join condition is in the Where clause. A related set of issues that can be addressed is that of pushing forward joining conditions; in a manner similar to the Predicate Pushdown feature of Hive. For e.g. these queries can have join conditions pushed down: {noformat} - query 1, push join predicate from 2nd join to 1st explain select * from part p1 join part p2 join part p3 on p1.p_name = p2.p_name and p2.p_name = p3.p_name; - query 2 explain select * from part p1 join part p2 join part p3 where p1.p_name = p2.p_name and p2.p_name = p3.p_name; {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846485#comment-13846485 ] Ashutosh Chauhan commented on HIVE-6024: This results in inconsistent semantic that while loading from local source files are *not* moved but copied, but while loading from DFS sources files are moved and thus are deleted at source location after operation. Ideally same semantic of move (delete at source) should be provided while loading either from DFS or local. What exactly should be semantic can be debated, however scope for this jira is limited to not a create an additional copy task while loading from local, but rather do the copy (instead of move) in MoveTask itself and thus saving on unnecessary task execution and FS operations. Load data local inpath unnecessarily creates a copy task Key: HIVE-6024 URL: https://issues.apache.org/jira/browse/HIVE-6024 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ashutosh Chauhan Load data command creates an additional copy task only when its loading from {{local}} It doesn't create this additional copy task while loading from DFS though. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Comment Edited] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846485#comment-13846485 ] Ashutosh Chauhan edited comment on HIVE-6024 at 12/12/13 5:41 PM: -- This results in inconsistent semantic that while loading from local source files are *not* moved but copied, but while loading from DFS sources files are moved and thus are deleted at source location after operation. Ideally same semantic of load (delete at source) should be provided while loading either from DFS or local. What exactly should be semantic can be debated, however scope for this jira is limited to not a create an additional copy task while loading from local, but rather do the copy (instead of move) in MoveTask itself and thus saving on unnecessary task execution and FS operations. was (Author: ashutoshc): This results in inconsistent semantic that while loading from local source files are *not* moved but copied, but while loading from DFS sources files are moved and thus are deleted at source location after operation. Ideally same semantic of move (delete at source) should be provided while loading either from DFS or local. What exactly should be semantic can be debated, however scope for this jira is limited to not a create an additional copy task while loading from local, but rather do the copy (instead of move) in MoveTask itself and thus saving on unnecessary task execution and FS operations. Load data local inpath unnecessarily creates a copy task Key: HIVE-6024 URL: https://issues.apache.org/jira/browse/HIVE-6024 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ashutosh Chauhan Load data command creates an additional copy task only when its loading from {{local}} It doesn't create this additional copy task while loading from DFS though. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6018) FetchTask should not reference metastore classes
[ https://issues.apache.org/jira/browse/HIVE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6018: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! FetchTask should not reference metastore classes Key: HIVE-6018 URL: https://issues.apache.org/jira/browse/HIVE-6018 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.13.0 Attachments: HIVE-6018.1.patch.txt The below code parts in PartitionDesc throws NoClassDefFounError sometimes in execution. {noformat} public Deserializer getDeserializer() { try { return MetaStoreUtils.getDeserializer(Hive.get().getConf(), getProperties()); } catch (Exception e) { return null; } } {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846533#comment-13846533 ] Ashutosh Chauhan commented on HIVE-6016: Instead of doing filter.accept() logic twice (after this patch), it seems like its enough to just do it once in outer loop(as introduced in this patch). Shall we remove existing filter.accept() from inner loop? Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. The effect of this bug is that Orc will not be able to read directories where there is a _SUCCESS file, say, as the first entry returned by the FileStatus. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846553#comment-13846553 ] Eric Hanson commented on HIVE-5996: --- Xuefu, I'm all for new, useful functionality and better performance for Hive. And I'm all for getting correct results. I appreciate your contributions and your passion. But I strongly believe changing behavior from one reasonable alternative to another in a way that breaks backward compatibility is not the way to go. I have a lot of experience with evolving a database (SQL Server) over a decade, and have talked to a many people who've been evolving the product longer than that. From this experience, I can say that changing backward compatibility (for either functionality or performance, but especially functionality) even in subtle ways can anger customers/users. Any changes to semantics like this should first of all be avoided, and if they can't be avoided, they need to be rolled out carefully, with a switch to enable backward compatibility. SQL Server has compatibility levels and SET options as switches, and a defined deprecation schedule. This is kind of process-heavy in the engineering effort, and also causes explosion of the test matrix. So I am not recommending necessarily that Hive go there, though maybe we need to have that discussion. I think we're better off being strict about not breaking backward compatibility unless really needed. So, I ask that you please close this JIRA without making a patch. There are a couple of other areas where there is an issue of ANSI SQL compatibility (result type of int/int and avg(int)). We could have a further discussion on those, though you know my preference would be to leave the semantics as-is on those since I think backward compatibility trumps ANSI SQL compatibility for those. If there is no issue of ANSI compatibility, and the current Hive behavior is reasonable, I'd like us to leave things as they are. I don't think there is a need to be across-the-board compatible with another system (MySQL or anything else). Best regards, Eric P.S. Your specific argument that you can overflow a bigint sum, while technically accurate, I think is not a significant user issue. I've never heard a complaint about it with SQL Server, or PDW, our scale-out data warehouse appliance. Really big numbers, like the national debt in pennies, fit in a bigint, just to put it in perspective. Users can cast the input to decimal or double if they need more magnitude. Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5555) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846557#comment-13846557 ] Harish Butani commented on HIVE-: - Had filed HIVE-5558 to address this. Haven't gotten around to doing it yet. Do you need this urgently? Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Attachments: AlternativeJoinSyntax.pdf Certain tools still generate `old style' Join queries where the join condition is in the Where clause. A related set of issues that can be addressed is that of pushing forward joining conditions; in a manner similar to the Predicate Pushdown feature of Hive. For e.g. these queries can have join conditions pushed down: {noformat} - query 1, push join predicate from 2nd join to 1st explain select * from part p1 join part p2 join part p3 on p1.p_name = p2.p_name and p2.p_name = p3.p_name; - query 2 explain select * from part p1 join part p2 join part p3 where p1.p_name = p2.p_name and p2.p_name = p3.p_name; {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 15663: Hive should be able to skip header and footer rows when reading data file for a table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15663/#review30269 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15663/#comment57938 this is the maximum number of footer a user can define. This prevent user defines too many footers which consume memory common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15663/#comment57939 fixed itests/qtest/pom.xml https://reviews.apache.org/r/15663/#comment57940 this is necessary if the test is in the MimimrCliDriver test class ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/15663/#comment57941 comment added ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/15663/#comment57942 Hi Eric, do you mean I need a blank line before each comment? I didn't see this on other part of the code. Or you mean a space after //? ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/15663/#comment57943 fixed ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/15663/#comment57944 name changed ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/15663/#comment57945 Since the return behavior is different in both place, it is hard to reuse the code in both place because of the minor differences ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/15663/#comment57946 fixed ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/15663/#comment57947 fixed ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/15663/#comment57948 fixed ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java https://reviews.apache.org/r/15663/#comment57949 fixed the comment. Since I need deep copy of the key and value field through ReflectionUtils, this new class is necessary ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java https://reviews.apache.org/r/15663/#comment57950 fixed ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java https://reviews.apache.org/r/15663/#comment57951 fixed ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java https://reviews.apache.org/r/15663/#comment57952 fixed ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java https://reviews.apache.org/r/15663/#comment57953 fixed ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java https://reviews.apache.org/r/15663/#comment57954 since the header and footer is removed based on each file, I think it should be fine if multiple splits are combined since each file will have its own path ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java https://reviews.apache.org/r/15663/#comment57955 yes, otherwise an exception will be thrown when accessing pathToPartitionInfo info during the test since the job context is incomplete in the unit test ql/src/test/queries/clientpositive/file_with_header_footer.q https://reviews.apache.org/r/15663/#comment57956 negative tests added for this senario - Shuaishuai Nie On Nov. 19, 2013, 1:31 a.m., Eric Hanson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15663/ --- (Updated Nov. 19, 2013, 1:31 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-5795 https://issues.apache.org/jira/browse/HIVE-5795 Repository: hive-git Description --- Hive should be able to skip header and footer rows when reading data file for a table (I am uploading this on behalf of Shuaishuai Nie since he's not in the office) Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 32ab3d8 data/files/header_footer_table_1/0001.txt PRE-CREATION data/files/header_footer_table_1/0002.txt PRE-CREATION data/files/header_footer_table_1/0003.txt PRE-CREATION data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION itests/qtest/pom.xml a453d8a ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 5abcfc1 ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java dd5cb6b ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 0ec6e63
Re: Review Request 15663: Hive should be able to skip header and footer rows when reading data file for a table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15663/#review30270 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15663/#comment57957 Hi Eric, I uploaded the new diff with the fixes here https://reviews.apache.org/r/16184/diff/#index_header common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15663/#comment57958 Hi Thejas, I uploaded the new diff file with the fixes in the review board here: https://reviews.apache.org/r/16184/diff/#index_header - Shuaishuai Nie On Nov. 19, 2013, 1:31 a.m., Eric Hanson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15663/ --- (Updated Nov. 19, 2013, 1:31 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-5795 https://issues.apache.org/jira/browse/HIVE-5795 Repository: hive-git Description --- Hive should be able to skip header and footer rows when reading data file for a table (I am uploading this on behalf of Shuaishuai Nie since he's not in the office) Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 32ab3d8 data/files/header_footer_table_1/0001.txt PRE-CREATION data/files/header_footer_table_1/0002.txt PRE-CREATION data/files/header_footer_table_1/0003.txt PRE-CREATION data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION itests/qtest/pom.xml a453d8a ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 5abcfc1 ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java dd5cb6b ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 0ec6e63 ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java 85dd975 ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 0686d9b ql/src/test/queries/clientpositive/file_with_header_footer.q PRE-CREATION ql/src/test/results/clientpositive/file_with_header_footer.q.out PRE-CREATION Diff: https://reviews.apache.org/r/15663/diff/ Testing --- Thanks, Eric Hanson
[jira] [Updated] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs
[ https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-5756: -- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs Key: HIVE-5756 URL: https://issues.apache.org/jira/browse/HIVE-5756 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, HIVE-5756.7.patch, HIVE-5756.8.patch Implement full, end-to-end support for IF in vectorized mode, including new VectorExpression class(es), VectorizationContext translation to a VectorExpression, and unit tests for these, as well as end-to-end ad hoc testing. An end-to-end .q test is recommended but optional. This is high priority because IF is the most popular conditional expression. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs
[ https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846596#comment-13846596 ] Eric Hanson commented on HIVE-5756: --- Thanks for the review, Jitendra! Committed to trunk. Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs Key: HIVE-5756 URL: https://issues.apache.org/jira/browse/HIVE-5756 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, HIVE-5756.7.patch, HIVE-5756.8.patch Implement full, end-to-end support for IF in vectorized mode, including new VectorExpression class(es), VectorizationContext translation to a VectorExpression, and unit tests for these, as well as end-to-end ad hoc testing. An end-to-end .q test is recommended but optional. This is high priority because IF is the most popular conditional expression. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846635#comment-13846635 ] Eric Hanson commented on HIVE-5996: --- Xuefu, I see you want to make changes to have Hive be more in line with MySQL semantics. Can you explain why you're making these changes or considering them? Thanks, Eric Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6023) Numeric Data Type Support
[ https://issues.apache.org/jira/browse/HIVE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846647#comment-13846647 ] Eric Hanson commented on HIVE-6023: --- Not that I know of. You could implement UDFs to do operations on some other type like a string to get the semantics you want but that sounds like too much work for this situation. Numeric Data Type Support - Key: HIVE-6023 URL: https://issues.apache.org/jira/browse/HIVE-6023 Project: Hive Issue Type: Improvement Components: Database/Schema, File Formats Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0 Environment: Hive 0.90, Linux, Hadoop, Data Type Extension Reporter: Deepak Raj Labels: Data, Hive, Numeric, Type1 Original Estimate: 2h Remaining Estimate: 2h Many companies are rethinking their strategies to adapt Hive into their ETL just for the reason that it does not support the most basic data types like Numeric(a,b). I believe there should be an improvement with upcoming versions of hive. Can we extend the hive with custom UDF for Numeric(a,b) data type? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6023) Numeric Data Type Support
[ https://issues.apache.org/jira/browse/HIVE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846640#comment-13846640 ] Deepak Raj commented on HIVE-6023: -- Well, The latest version does come with decimal(p,s) but is there a way we can add a custom data type to hive older version?.I don't think we can.. Numeric Data Type Support - Key: HIVE-6023 URL: https://issues.apache.org/jira/browse/HIVE-6023 Project: Hive Issue Type: Improvement Components: Database/Schema, File Formats Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0 Environment: Hive 0.90, Linux, Hadoop, Data Type Extension Reporter: Deepak Raj Labels: Data, Hive, Numeric, Type1 Original Estimate: 2h Remaining Estimate: 2h Many companies are rethinking their strategies to adapt Hive into their ETL just for the reason that it does not support the most basic data types like Numeric(a,b). I believe there should be an improvement with upcoming versions of hive. Can we extend the hive with custom UDF for Numeric(a,b) data type? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one
[ https://issues.apache.org/jira/browse/HIVE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846675#comment-13846675 ] Eric Hanson commented on HIVE-6015: --- +1 vectorized logarithm produces results for 0 that are different from a non-vectorized one Key: HIVE-6015 URL: https://issues.apache.org/jira/browse/HIVE-6015 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: vectorization Attachments: HIVE-6015.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6025) Add Prasad to committer list
Prasad Mujumdar created HIVE-6025: - Summary: Add Prasad to committer list Key: HIVE-6025 URL: https://issues.apache.org/jira/browse/HIVE-6025 Project: Hive Issue Type: Test Components: Documentation Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Priority: Minor -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6025) Add Prasad to committer list
[ https://issues.apache.org/jira/browse/HIVE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-6025: -- Attachment: HIVE-6025.1.patch Add Prasad to committer list - Key: HIVE-6025 URL: https://issues.apache.org/jira/browse/HIVE-6025 Project: Hive Issue Type: Test Components: Documentation Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Priority: Minor Attachments: HIVE-6025.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16207: HIVE-1466: Add NULL DEFINED AS to ROW FORMAT specification
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16207/ --- (Updated Dec. 12, 2013, 8:20 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-1466 https://issues.apache.org/jira/browse/HIVE-1466 Repository: hive-git Description --- Support configurable null format for tables and writing out to directory. Using a non-default null format is a bit while creating a table, and pretty much impossible to export the data to local filesystem using insert overwrite directory. The patch enhances the SQL syntax to support 'NULL DEFINED AS' construct for create table as well as insert overwrite directory. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java d32be59 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 4b7fc73 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 366b714 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 5e5b8cf ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 8cf5ad6 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d0a0ec7 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 93b4181 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java b6097b1 ql/src/test/queries/clientpositive/nullformat.q PRE-CREATION ql/src/test/queries/clientpositive/nullformatdir.q PRE-CREATION ql/src/test/results/clientpositive/nullformat.q.out PRE-CREATION ql/src/test/results/clientpositive/nullformatdir.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16207/diff/ Testing --- Added new tests. Thanks, Prasad Mujumdar
[jira] [Commented] (HIVE-6025) Add Prasad to committer list
[ https://issues.apache.org/jira/browse/HIVE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846699#comment-13846699 ] Xuefu Zhang commented on HIVE-6025: --- +1 Add Prasad to committer list - Key: HIVE-6025 URL: https://issues.apache.org/jira/browse/HIVE-6025 Project: Hive Issue Type: Test Components: Documentation Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Priority: Minor Attachments: HIVE-6025.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846705#comment-13846705 ] Xuefu Zhang commented on HIVE-5996: --- [~ehans] Thanks for sharing your thoughts and your inquiry. For your information, I'm not trying to make MySQL as the model. My first line of consideration is SQL standard. For a functionality if there is no SQL standard, Hive doesn't have to invent everything, thus, I do reference MySQL for ideas, mostly because MySQL and its technical documentation are readily available. However, this doesn't precludeme following other DB's practice. For instance, precision/scale determination for arithmetic operations in hive is following SQL server's formula. I'm not either anti- or pro- MySQL. Nor am I to SQL server, but I strongly believe that following well-established practices benefits Hive than doing something in a unique, unfortunate way. An example would be int/int in Hive. However, a lot of existing functionality in Hive was put into place when Hive is positioned as a tool rather than DB, and before all necessary data types were introduced. Take int/int again as an example, early developer probably didn't even think about SQL-compliance, and even if he/she did, there wasn't decimal data type as a consideration. As Hive is shift to a DB on bigdata positioning, I believe that we should start thinking in a perspective other than performance or backward compatibility. If we restrict ourselves based on unconscious decisions made in the past, we may lose a lot of opportunities of doing the right things. As I worked on decimal precision/scale support, I found a lot of problems in Hive about data types and their conversions and promotions. In many cases, Hive is not consistent itself. Let me ask you a question to see if you know the answer: what's the return type of 35 + '3.14', where 35 is from int column and '3.14' from a string column? Before I made the changes, you probably would say: wait, let me read the code first. And your answer might be different if my question were 35/'3.14'. Now, to answer the same questions, I can give right way, and I have a theory to tell why. In summary, it's a lot of effort to clean up the mess and inconsistency in Hive from the beginning of my work on decimal. Now if we use either performance or backward compatibility to shut down what we have achieved, I don't see how Hive is shifting from a tool to a DB, and how Hive can become adopted as enterprise grade product. Hive is still evolving, and that's why I think we have certain luxury of breaking backward compatibility for doing the right thing. As Ashutosh once mentioned, we don't want to be backward compatible to a bug. Once Hive is stabilized, it becomes much harder to make backward incompatible changes, as you know with your experience with SQL server. I understand your concern about backward compatibility, especially your possible frustration over vectorization breaking or redoing. On the other hand, I think we are here to help hive become more useful. A blunt rejection without much consideration and communication doesn't seem as helpful and constructive as it should be. Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846718#comment-13846718 ] Thejas M Nair commented on HIVE-5996: - bq. Now, to answer the same questions, I can give right way, and I have a theory to tell why. It would be great if you can document the theory, otherwise I still would need to look at code to understand the theory ! :) I really appreciate the code cleanup you have been doing. But we have to be careful about backward compatibility. I also agree that we should not burden new users with historic problems. Regarding Once Hive is stabilized, how do we define that ? Maybe, once we create a list of non backward compatible changes that are important to make, we can make a major release version (1.x) , we can break the backward compatibility of certain things, and document it very well. Hopefully, that list of non-backward compatible changes can be kept small. I discuss this in context of config defaults in HIVE-5875 . Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846721#comment-13846721 ] Thejas M Nair commented on HIVE-5996: - Regarding the specific change in this jira, I am not convinced that is an important non-backward compatible change to make. You can have an overflow even with decimal type, if they are large enough, with just two rows. On the other hand, the int division returning double is arguably a change to consider for a 1.0 candidate, as that is a SQL compliance issue. Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HIVE-5824) Support generation of html test reports in maven
[ https://issues.apache.org/jira/browse/HIVE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J resolved HIVE-5824. -- Resolution: Not A Problem As [~ashutoshc] pointed out.. This patch is not required as surefire reporting plugin was already in pom.xml.. Closing it as Not a Problem. Support generation of html test reports in maven Key: HIVE-5824 URL: https://issues.apache.org/jira/browse/HIVE-5824 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Minor Labels: build, maven, test Attachments: HIVE-5824.2.patch.txt, HIVE-5824.patch.txt {code}ant testreport{code} generated output of test results in html format. It is useful to support the same in maven. The default test report generated by maven is in XML format which is hard to read. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults
[ https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-4395: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Support TFetchOrientation.FIRST for HiveServer2 FetchResults Key: HIVE-4395 URL: https://issues.apache.org/jira/browse/HIVE-4395 Project: Hive Issue Type: Improvement Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch, HIVE-4395.3.patch, HIVE-4395.4.patch, HIVE-4395.5.patch, HIVE-4395.6.patch Currently HiveServer2 only support fetching next row (TFetchOrientation.NEXT). This ticket is to implement support for TFetchOrientation.FIRST that resets the fetch position at the begining of the resultset. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6026) Ldap Authenticator should be more generic with BindDN
Johndee Burks created HIVE-6026: --- Summary: Ldap Authenticator should be more generic with BindDN Key: HIVE-6026 URL: https://issues.apache.org/jira/browse/HIVE-6026 Project: Hive Issue Type: Bug Components: Authentication Affects Versions: 0.10.0 Environment: CDH4.4, Fedora Directory Service Reporter: Johndee Burks Priority: Minor The bindDN implementation should be more generic for the LDAP authenticator. Currently it looks like this: 49 // setup the security principal 50 String bindDN; 51 if (baseDN != null) { 52 bindDN = uid= + user + , + baseDN; 53 } else { 54 bindDN = user; 55 } This causes problems for ldap implementations that expect cn= first. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-5783: - Component/s: Serializers/Deserializers Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-5783: - Fix Version/s: (was: 0.11.0) Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846797#comment-13846797 ] Eric Hanson commented on HIVE-5783: --- Could somebody put the patch on ReviewBoard? That's make it easier to look at. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6014) Stage ids differ in the tez branch
[ https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6014: - Fix Version/s: tez-branch Stage ids differ in the tez branch -- Key: HIVE-6014 URL: https://issues.apache.org/jira/browse/HIVE-6014 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-6014.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5991) ORC RLEv2 fails with ArrayIndexOutOfBounds exception for PATCHED_BLOB encoding
[ https://issues.apache.org/jira/browse/HIVE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5991: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Prasanth! ORC RLEv2 fails with ArrayIndexOutOfBounds exception for PATCHED_BLOB encoding --- Key: HIVE-5991 URL: https://issues.apache.org/jira/browse/HIVE-5991 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.13.0 Attachments: HIVE-5991.1.patch PATCHED_BLOB encoding creates mask with number of bits required for 95th percentile value. If the 95th percentile value requires 32 bits then the mask creation will result in integer overflow. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
[ https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5994: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Prasanth! ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits ) Key: HIVE-5994 URL: https://issues.apache.org/jira/browse/HIVE-5994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.13.0 Attachments: HIVE-5994.1.patch For large negative BIGINTs, zigzag encoding will yield large value (64bit value) with MSB set to 1. This value is interpreted as negative value in SerializationUtils.findClosestNumBits(long value) function. This resulted in wrong computation of total number of bits required which results in wrong encoding/decoding of values. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6004) Fix statistics annotation related test failures in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6004: --- Status: Patch Available (was: Open) Marking Patch Available to get Hive QA run. Fix statistics annotation related test failures in hadoop2 -- Key: HIVE-6004 URL: https://issues.apache.org/jira/browse/HIVE-6004 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.13.0 Attachments: HIVE-6004.1.patch Fix test failures that are related to HIVE-5369 and its subtask changes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846836#comment-13846836 ] Prasanth J commented on HIVE-6004: -- [~ashutoshc] these are hadoop2 tests. Will Hive QA run hadoop2 tests as well? Fix statistics annotation related test failures in hadoop2 -- Key: HIVE-6004 URL: https://issues.apache.org/jira/browse/HIVE-6004 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.13.0 Attachments: HIVE-6004.1.patch Fix test failures that are related to HIVE-5369 and its subtask changes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846837#comment-13846837 ] Ashutosh Chauhan commented on HIVE-6016: Sorry, I was confused those are not 2 loops, but constructor and overloaded method. Patch looks good. +1 Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. The effect of this bug is that Orc will not be able to read directories where there is a _SUCCESS file, say, as the first entry returned by the FileStatus. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846843#comment-13846843 ] Ashutosh Chauhan commented on HIVE-6004: No, but there is stats_partialscan_autogather is there eg, which looks like ll run for hadoop-1 also. There might be others too. So, wanted to make sure. Fix statistics annotation related test failures in hadoop2 -- Key: HIVE-6004 URL: https://issues.apache.org/jira/browse/HIVE-6004 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.13.0 Attachments: HIVE-6004.1.patch Fix test failures that are related to HIVE-5369 and its subtask changes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846848#comment-13846848 ] Prasanth J commented on HIVE-6004: -- Makes sense. Fix statistics annotation related test failures in hadoop2 -- Key: HIVE-6004 URL: https://issues.apache.org/jira/browse/HIVE-6004 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.13.0 Attachments: HIVE-6004.1.patch Fix test failures that are related to HIVE-5369 and its subtask changes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5555) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846888#comment-13846888 ] Matt Tucker commented on HIVE-: --- It's a nice to have. Just wanted to make sure that syntax was covered since it's similar to the other examples. Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Attachments: AlternativeJoinSyntax.pdf Certain tools still generate `old style' Join queries where the join condition is in the Where clause. A related set of issues that can be addressed is that of pushing forward joining conditions; in a manner similar to the Predicate Pushdown feature of Hive. For e.g. these queries can have join conditions pushed down: {noformat} - query 1, push join predicate from 2nd join to 1st explain select * from part p1 join part p2 join part p3 on p1.p_name = p2.p_name and p2.p_name = p3.p_name; - query 2 explain select * from part p1 join part p2 join part p3 where p1.p_name = p2.p_name and p2.p_name = p3.p_name; {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6027) non-vectorized log10 has rounding issue
Sergey Shelukhin created HIVE-6027: -- Summary: non-vectorized log10 has rounding issue Key: HIVE-6027 URL: https://issues.apache.org/jira/browse/HIVE-6027 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial In HIVE-6010, I found that vectorized and non-vectorized log10 may produce different results in the last digit of the mantissa (e.g. 7 vs 8). It turns out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). Both should use Math.log10. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6016: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Prasanth! Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Fix For: 0.13.0 Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. The effect of this bug is that Orc will not be able to read directories where there is a _SUCCESS file, say, as the first entry returned by the FileStatus. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6027) non-vectorized log10 has rounding issue
[ https://issues.apache.org/jira/browse/HIVE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6027: --- Attachment: HIVE-6027.patch trivial patch non-vectorized log10 has rounding issue --- Key: HIVE-6027 URL: https://issues.apache.org/jira/browse/HIVE-6027 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Attachments: HIVE-6027.patch In HIVE-6010, I found that vectorized and non-vectorized log10 may produce different results in the last digit of the mantissa (e.g. 7 vs 8). It turns out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). Both should use Math.log10. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6027) non-vectorized log10 has rounding issue
[ https://issues.apache.org/jira/browse/HIVE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6027: --- Status: Patch Available (was: Open) non-vectorized log10 has rounding issue --- Key: HIVE-6027 URL: https://issues.apache.org/jira/browse/HIVE-6027 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Attachments: HIVE-6027.patch In HIVE-6010, I found that vectorized and non-vectorized log10 may produce different results in the last digit of the mantissa (e.g. 7 vs 8). It turns out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). Both should use Math.log10. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6028) Partition predicate literals are not interpreted correctly.
Pala M Muthaia created HIVE-6028: Summary: Partition predicate literals are not interpreted correctly. Key: HIVE-6028 URL: https://issues.apache.org/jira/browse/HIVE-6028 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Pala M Muthaia When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 21 00 -- explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. explain select * from test_partition_pred where hour=00; ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) 00 STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test_partition_pred Filter Operator predicate: expr: (hour = 0) type: boolean Select Operator expressions: expr: col1 type: int expr: hour type: string outputColumnNames: _col0, _col1 ListSink -- comparing plan for query with correct result explain select * from test_partition_pred where hour='00'; ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) '00' STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test_partition_pred Select Operator expressions: expr: col1 type: int expr: hour type: string outputColumnNames: _col0, _col1 ListSink Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6028) Partition predicate literals are not interpreted correctly.
[ https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pala M Muthaia updated HIVE-6028: - Attachment: Hive-6028-explain-plan.txt Partition predicate literals are not interpreted correctly. --- Key: HIVE-6028 URL: https://issues.apache.org/jira/browse/HIVE-6028 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Pala M Muthaia Attachments: Hive-6028-explain-plan.txt When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 2100 Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6028) Partition predicate literals are not interpreted correctly.
[ https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pala M Muthaia updated HIVE-6028: - Description: When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 21 00 Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. was: When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 21 00 -- explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. explain select * from test_partition_pred where hour=00; ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) 00 STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test_partition_pred Filter Operator predicate: expr: (hour = 0) type: boolean Select Operator expressions: expr: col1 type: int expr: hour type: string outputColumnNames: _col0, _col1 ListSink -- comparing plan for query with correct result explain select * from test_partition_pred where hour='00'; ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) '00' STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test_partition_pred Select Operator expressions: expr: col1 type: int expr: hour type: string outputColumnNames: _col0, _col1 ListSink Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00
[jira] [Commented] (HIVE-5966) Fix eclipse:eclipse post shim aggregation changes
[ https://issues.apache.org/jira/browse/HIVE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846919#comment-13846919 ] Szehon Ho commented on HIVE-5966: - Hi [~brocknoland], wondering if this be committed as it was reviewed on reviewboard, or it needs more thought ? Not very urgent, but it would help productivity by getting rid of eclipse errors. Fix eclipse:eclipse post shim aggregation changes - Key: HIVE-5966 URL: https://issues.apache.org/jira/browse/HIVE-5966 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Szehon Ho Attachments: HIVE-5966.1.patch, HIVE-5966.patch The shim bundle module marks it's deps provided so users of the bundle won't pull in the child dependencies. This causes the eclipse workspace generated by eclipse:eclipse to fail because it only includes the source from the bundle source directory, which is empty. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6028) Partition predicate literals are not interpreted correctly.
[ https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pala M Muthaia updated HIVE-6028: - Description: When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 21 00 explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. See attached explain plan file. Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. was: When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 21 00 Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. Partition predicate literals are not interpreted correctly. --- Key: HIVE-6028 URL: https://issues.apache.org/jira/browse/HIVE-6028 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Pala M Muthaia Attachments: Hive-6028-explain-plan.txt When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 2100 explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. See attached explain plan file. Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other
[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846936#comment-13846936 ] Hive QA commented on HIVE-6004: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618195/HIVE-6004.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4779 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/624/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/624/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12618195 Fix statistics annotation related test failures in hadoop2 -- Key: HIVE-6004 URL: https://issues.apache.org/jira/browse/HIVE-6004 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.13.0 Attachments: HIVE-6004.1.patch Fix test failures that are related to HIVE-5369 and its subtask changes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16207: HIVE-1466: Add NULL DEFINED AS to ROW FORMAT specification
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16207/#review30300 --- ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java https://reviews.apache.org/r/16207/#comment58009 It would be nice to remove the leading tabs. - Xuefu Zhang On Dec. 12, 2013, 8:20 p.m., Prasad Mujumdar wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16207/ --- (Updated Dec. 12, 2013, 8:20 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-1466 https://issues.apache.org/jira/browse/HIVE-1466 Repository: hive-git Description --- Support configurable null format for tables and writing out to directory. Using a non-default null format is a bit while creating a table, and pretty much impossible to export the data to local filesystem using insert overwrite directory. The patch enhances the SQL syntax to support 'NULL DEFINED AS' construct for create table as well as insert overwrite directory. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java d32be59 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 4b7fc73 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 366b714 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 5e5b8cf ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 8cf5ad6 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d0a0ec7 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 93b4181 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java b6097b1 ql/src/test/queries/clientpositive/nullformat.q PRE-CREATION ql/src/test/queries/clientpositive/nullformatdir.q PRE-CREATION ql/src/test/results/clientpositive/nullformat.q.out PRE-CREATION ql/src/test/results/clientpositive/nullformatdir.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16207/diff/ Testing --- Added new tests. Thanks, Prasad Mujumdar
[jira] [Commented] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification
[ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846939#comment-13846939 ] Xuefu Zhang commented on HIVE-1466: --- Patch looks good. Minor common on RB. Add NULL DEFINED AS to ROW FORMAT specification --- Key: HIVE-1466 URL: https://issues.apache.org/jira/browse/HIVE-1466 Project: Hive Issue Type: Improvement Reporter: Adam Kramer Assignee: Prasad Mujumdar Attachments: HIVE-1466.1.patch NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6010: --- Attachment: HIVE-6010.patch Here's the patch, with one example test. More tests can be added in other JIRAs (incl. for metastore stuff I mentioned, maybe). Depending on whether this or logarithm fix goes first, I will uncomment logarithms in this test here or there. Or in another jira. Already found one bug using this :) HIVE-6027. create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6010.patch So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6010: --- Status: Patch Available (was: Open) create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6010.patch So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846949#comment-13846949 ] Ashutosh Chauhan commented on HIVE-6004: As suspected : ) Lets take out stats_partialscan_autogather from this patch, get this one committed and analyze auto_stats_partialscan in a diferent jira. Fix statistics annotation related test failures in hadoop2 -- Key: HIVE-6004 URL: https://issues.apache.org/jira/browse/HIVE-6004 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.13.0 Attachments: HIVE-6004.1.patch Fix test failures that are related to HIVE-5369 and its subtask changes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Review Request 16229: HIVE-6010 create a test that would ensure vectorization produces same results as non-vectorized execution
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16229/ --- Review request for hive and Jitendra Pandey. Bugs: HIVE-6010 https://issues.apache.org/jira/browse/HIVE-6010 Repository: hive-git Description --- See jira. Diffs - ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java 85351aa itests/qtest/pom.xml 8c249a0 itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java c16e82d ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLog10.java 4b6dc6a ql/src/test/queries/clientcompare/vectorized_math_funcs.q PRE-CREATION ql/src/test/queries/clientcompare/vectorized_math_funcs_00.qv PRE-CREATION ql/src/test/queries/clientcompare/vectorized_math_funcs_01.qv PRE-CREATION ql/src/test/templates/TestCompareCliDriver.vm PRE-CREATION Diff: https://reviews.apache.org/r/16229/diff/ Testing --- Thanks, Sergey Shelukhin
Re: Review Request 16229: HIVE-6010 create a test that would ensure vectorization produces same results as non-vectorized execution
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16229/ --- (Updated Dec. 13, 2013, 12:09 a.m.) Review request for hive and Jitendra Pandey. Bugs: HIVE-6010 https://issues.apache.org/jira/browse/HIVE-6010 Repository: hive-git Description --- See jira. Diffs - ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java 85351aa itests/qtest/pom.xml 8c249a0 itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java c16e82d ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLog10.java 4b6dc6a ql/src/test/queries/clientcompare/vectorized_math_funcs.q PRE-CREATION ql/src/test/queries/clientcompare/vectorized_math_funcs_00.qv PRE-CREATION ql/src/test/queries/clientcompare/vectorized_math_funcs_01.qv PRE-CREATION ql/src/test/templates/TestCompareCliDriver.vm PRE-CREATION Diff: https://reviews.apache.org/r/16229/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846954#comment-13846954 ] Sergey Shelukhin commented on HIVE-6010: https://reviews.apache.org/r/16229/ create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6010.patch So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-2093: Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution [~navis]! Can you please update the release note section so that we can add that to wiki docs ? (If you prefer, you can also update the wiki docs directly) create/drop database should populate inputs/outputs and check concurrency and user permission - Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: Bug Components: Authorization, Locking, Metastore, Security Reporter: Namit Jain Assignee: Navis Fix For: 0.13.0 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch concurrency and authorization are needed for create/drop table. Also to make concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS DATABASE -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-2093: Issue Type: New Feature (was: Bug) create/drop database should populate inputs/outputs and check concurrency and user permission - Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: New Feature Components: Authorization, Locking, Metastore, Security Reporter: Namit Jain Assignee: Navis Fix For: 0.13.0 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch concurrency and authorization are needed for create/drop table. Also to make concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS DATABASE -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5555) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-: Issue Type: New Feature (was: Bug) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Attachments: AlternativeJoinSyntax.pdf Certain tools still generate `old style' Join queries where the join condition is in the Where clause. A related set of issues that can be addressed is that of pushing forward joining conditions; in a manner similar to the Predicate Pushdown feature of Hive. For e.g. these queries can have join conditions pushed down: {noformat} - query 1, push join predicate from 2nd join to 1st explain select * from part p1 join part p2 join part p3 on p1.p_name = p2.p_name and p2.p_name = p3.p_name; - query 2 explain select * from part p1 join part p2 join part p3 where p1.p_name = p2.p_name and p2.p_name = p3.p_name; {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6027) non-vectorized log10 has rounding issue
[ https://issues.apache.org/jira/browse/HIVE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846996#comment-13846996 ] Hive QA commented on HIVE-6027: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618494/HIVE-6027.patch {color:green}SUCCESS:{color} +1 4779 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/625/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/625/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618494 non-vectorized log10 has rounding issue --- Key: HIVE-6027 URL: https://issues.apache.org/jira/browse/HIVE-6027 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Attachments: HIVE-6027.patch In HIVE-6010, I found that vectorized and non-vectorized log10 may produce different results in the last digit of the mantissa (e.g. 7 vs 8). It turns out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). Both should use Math.log10. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16167: HIVE-5595 Implement Vectorized SMB Join
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16167/#review30289 --- ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57976 Please remove trailing white space in all your code. You can set the eclipse editor to do this. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57980 Can you add a comment about the purpose of this class and the major differences from regular SMB Join? ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57985 Excellent variable names ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57984 Please add a few comments in the body explaining the major sections ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57983 Does the fact that this is a map from Byte mean there is a limit of 127 ANDed filter expressions? I guess that is enough for most purposes but it seems like a like internal limit. Not sure if this is a limitation inherited from someplace else. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57987 need blank after = ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57989 Sun Java coding standards say put blanks around =, ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57988 and replacing them ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57993 please put blanks around : ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment57998 good comment! spell out atm ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment58020 I don't understand this because the body of the loop does not change for each trip through the loop. It looks like you are doing the same thing inBatch.size times. Is this right? If so, please explain. Should tag be batchIndex? ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment58021 Please add comment before method ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java https://reviews.apache.org/r/16167/#comment58022 add blanks around operators ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java https://reviews.apache.org/r/16167/#comment58023 Please comment the tests to explain what you are checking - Eric Hanson On Dec. 11, 2013, 7:26 a.m., Remus Rusanu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16167/ --- (Updated Dec. 11, 2013, 7:26 a.m.) Review request for hive, Ashutosh Chauhan, Eric Hanson, and Jitendra Pandey. Bugs: HIVE-5595 https://issues.apache.org/jira/browse/HIVE-5595 Repository: hive-git Description --- See HIVE-5595 I will post description Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 24a812d ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 81a1232 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 19f7d79 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/CommonRCFileInputFormat.java 4bfeb20 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java abdc165 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 7859e56 ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java 02031ea ql/src/test/queries/clientpositive/vectorized_bucketmapjoin1.q PRE-CREATION ql/src/test/results/clientpositive/vectorized_bucketmapjoin1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16167/diff/ Testing --- New .q file, manually tested several cases Thanks, Remus Rusanu
[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847002#comment-13847002 ] Hive QA commented on HIVE-6010: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618503/HIVE-6010.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/626/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/626/console Messages: {noformat} This message was trimmed, see log for full details [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2399,49] cannot find symbol symbol : class UnlockDatabaseDesc location: class org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2431,35] reference to DDLWork is ambiguous, both method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork and method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork match [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2443,35] reference to DDLWork is ambiguous, both method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork and method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork match [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2493,35] reference to DDLWork is ambiguous, both method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork and method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork match [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2513,35] reference to DDLWork is ambiguous, both method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork and method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork match [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2529,35] reference to DDLWork is ambiguous, both method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork and method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork match [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2541,35] reference to DDLWork is ambiguous, both method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork and method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork match [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2589,35] reference to DDLWork is ambiguous, both method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc) in org.apache.hadoop.hive.ql.plan.DDLWork and method DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc) in
[jira] [Commented] (HIVE-5595) Implement vectorized SMB JOIN
[ https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847003#comment-13847003 ] Eric Hanson commented on HIVE-5595: --- Hi Remus, Overall this looks good! Please see my comments on ReviewBoard. Eric Implement vectorized SMB JOIN - Key: HIVE-5595 URL: https://issues.apache.org/jira/browse/HIVE-5595 Project: Hive Issue Type: Sub-task Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical Attachments: HIVE-5595.1.patch, HIVE-5595.2.patch Original Estimate: 168h Remaining Estimate: 168h -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6027) non-vectorized log10 has rounding issue
[ https://issues.apache.org/jira/browse/HIVE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847071#comment-13847071 ] Ashutosh Chauhan commented on HIVE-6027: +1 non-vectorized log10 has rounding issue --- Key: HIVE-6027 URL: https://issues.apache.org/jira/browse/HIVE-6027 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Attachments: HIVE-6027.patch In HIVE-6010, I found that vectorized and non-vectorized log10 may produce different results in the last digit of the mantissa (e.g. 7 vs 8). It turns out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). Both should use Math.log10. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5924) Save operation logs in per operation directories in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847078#comment-13847078 ] Vaibhav Gumashta commented on HIVE-5924: Thanks [~jaideepdhok]. Couple of questions: 1. Would enabling the per session/operation log config mean that there will be no consolidated log? 2. Regarding 6.), there is an open JIRA - [HIVE-5268|https://issues.apache.org/jira/browse/HIVE-5268] which has some overlap. There is also a different approach taken here [HIVE-5799|https://issues.apache.org/jira/browse/HIVE-5799], which is being discussed. I'd be curious to hear what your method of detecting abandoned sessions is. Look forward to the patch. Thanks! Save operation logs in per operation directories in HiveServer2 --- Key: HIVE-5924 URL: https://issues.apache.org/jira/browse/HIVE-5924 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Jaideep Dhok Assignee: Jaideep Dhok -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5924) Save operation logs in per operation directories in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-5924: --- Affects Version/s: 0.13.0 Save operation logs in per operation directories in HiveServer2 --- Key: HIVE-5924 URL: https://issues.apache.org/jira/browse/HIVE-5924 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Jaideep Dhok Assignee: Jaideep Dhok -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Reopened] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner reopened HIVE-2093: -- create/drop database should populate inputs/outputs and check concurrency and user permission - Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: New Feature Components: Authorization, Locking, Metastore, Security Reporter: Namit Jain Assignee: Navis Fix For: 0.13.0 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch concurrency and authorization are needed for create/drop table. Also to make concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS DATABASE -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847104#comment-13847104 ] Gunther Hagleitner commented on HIVE-2093: -- [~thejas] This is breaking the build. I think you might have forgotten to add some files (UnlockDatabaseDesc/LockDatabaseDesc)? create/drop database should populate inputs/outputs and check concurrency and user permission - Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: New Feature Components: Authorization, Locking, Metastore, Security Reporter: Namit Jain Assignee: Navis Fix For: 0.13.0 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch concurrency and authorization are needed for create/drop table. Also to make concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS DATABASE -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847110#comment-13847110 ] Gunther Hagleitner commented on HIVE-2093: -- I think it's just the two files. I will commit those (from patch .9) create/drop database should populate inputs/outputs and check concurrency and user permission - Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: New Feature Components: Authorization, Locking, Metastore, Security Reporter: Namit Jain Assignee: Navis Fix For: 0.13.0 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch concurrency and authorization are needed for create/drop table. Also to make concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS DATABASE -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847113#comment-13847113 ] Gunther Hagleitner commented on HIVE-2093: -- Committed UnlockDatabaseDesc and LockDatabaseDesc. Build is working again for me. create/drop database should populate inputs/outputs and check concurrency and user permission - Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: New Feature Components: Authorization, Locking, Metastore, Security Reporter: Namit Jain Assignee: Navis Fix For: 0.13.0 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch concurrency and authorization are needed for create/drop table. Also to make concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS DATABASE -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-2093. -- Resolution: Fixed create/drop database should populate inputs/outputs and check concurrency and user permission - Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: New Feature Components: Authorization, Locking, Metastore, Security Reporter: Namit Jain Assignee: Navis Fix For: 0.13.0 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch concurrency and authorization are needed for create/drop table. Also to make concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS DATABASE -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6000) Hive build broken on hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6000: - Fix Version/s: 0.13.0 Hive build broken on hadoop2 Key: HIVE-6000 URL: https://issues.apache.org/jira/browse/HIVE-6000 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Vikram Dixit K Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6000.1.patch When I build on hadoop2 since yesterday, I get {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project hive-it-unit: Compilation failure: Compilation failure: [ERROR] /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41] package org.apache.hadoop.hbase.zookeeper does not exist [ERROR] /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11] cannot find symbol [ERROR] symbol : class MiniZooKeeperCluster [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore [ERROR] /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26] cannot find symbol [ERROR] symbol : class MiniZooKeeperCluster {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6000) Hive build broken on hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6000: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks Vikram! Hive build broken on hadoop2 Key: HIVE-6000 URL: https://issues.apache.org/jira/browse/HIVE-6000 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Vikram Dixit K Priority: Blocker Attachments: HIVE-6000.1.patch When I build on hadoop2 since yesterday, I get {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile (default-testCompile) on project hive-it-unit: Compilation failure: Compilation failure: [ERROR] /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41] package org.apache.hadoop.hbase.zookeeper does not exist [ERROR] /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11] cannot find symbol [ERROR] symbol : class MiniZooKeeperCluster [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore [ERROR] /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26] cannot find symbol [ERROR] symbol : class MiniZooKeeperCluster {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16207: HIVE-1466: Add NULL DEFINED AS to ROW FORMAT specification
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16207/ --- (Updated Dec. 13, 2013, 3:35 a.m.) Review request for hive and Xuefu Zhang. Changes --- Fixed formatting (tabs) Fixed 'show create table' to support null format Added testcase for CTAS Bugs: HIVE-1466 https://issues.apache.org/jira/browse/HIVE-1466 Repository: hive-git Description --- Support configurable null format for tables and writing out to directory. Using a non-default null format is a bit while creating a table, and pretty much impossible to export the data to local filesystem using insert overwrite directory. The patch enhances the SQL syntax to support 'NULL DEFINED AS' construct for create table as well as insert overwrite directory. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 41df473 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java fdc0d1a ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 366b714 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g b146df6 ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 8cf5ad6 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ace1df9 ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 93b4181 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java b6097b1 ql/src/test/queries/clientpositive/nullformat.q PRE-CREATION ql/src/test/queries/clientpositive/nullformatCTAS.q PRE-CREATION ql/src/test/queries/clientpositive/nullformatdir.q PRE-CREATION ql/src/test/results/clientpositive/nullformat.q.out PRE-CREATION ql/src/test/results/clientpositive/nullformatCTAS.q.out PRE-CREATION ql/src/test/results/clientpositive/nullformatdir.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16207/diff/ Testing --- Added new tests. Thanks, Prasad Mujumdar
[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification
[ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-1466: -- Attachment: HIVE-1466.2.patch Addressed review comments, added more test cases Add NULL DEFINED AS to ROW FORMAT specification --- Key: HIVE-1466 URL: https://issues.apache.org/jira/browse/HIVE-1466 Project: Hive Issue Type: Improvement Reporter: Adam Kramer Assignee: Prasad Mujumdar Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-5996: -- Status: Open (was: Patch Available) Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847150#comment-13847150 ] Xuefu Zhang commented on HIVE-5996: --- {quote} It would be great if you can document the theory, otherwise I still would need to look at code to understand the theory {quote} I will put it somewhere on the wiki. {quote} You can have an overflow even with decimal type, if they are large enough, with just two rows. {quote} It's impossible to overflow output decimal type with just two rows because the precision of the output decimal type is 10 + the precision of the input type. In case of long input, the output decimal type is (29,0). Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5924) Save operation logs in per operation directories in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847153#comment-13847153 ] Jaideep Dhok commented on HIVE-5924: bq. we can close the session We will not actually close the session, just delete the log files. Save operation logs in per operation directories in HiveServer2 --- Key: HIVE-5924 URL: https://issues.apache.org/jira/browse/HIVE-5924 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Jaideep Dhok Assignee: Jaideep Dhok -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5924) Save operation logs in per operation directories in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847151#comment-13847151 ] Jaideep Dhok commented on HIVE-5924: [~vgumashta] Thanks for looking at the issue. bq. 1. Would enabling the per session/operation log config mean that there will be no consolidated log? HiveServer2 logs like session open, session close etc will continue to be consolidated. Only the query logs like job client logs, driver or task logs will be redirected. Turning off the log redirection would again consolidate everything into a single log file as is done currently. bq. I'd be curious to hear what your method of detecting abandoned sessions is. For detecting abandoned sessions w.r.t. log purging, I can check the last modified time of an operation log file. If that is older than a configured value, we can close the session. Save operation logs in per operation directories in HiveServer2 --- Key: HIVE-5924 URL: https://issues.apache.org/jira/browse/HIVE-5924 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Jaideep Dhok Assignee: Jaideep Dhok -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-hadoop2 - Build # 594 - Still Failing
Changes for Build #560 [xuefu] HIVE-5356: Move arithmatic UDFs to generic UDF implementations (reviewed by Brock) [hashutosh] HIVE-5846 : Analyze command fails with vectorization on (Remus Rusanu via Ashutosh Chauhan) [hashutosh] HIVE-2055 : Hive should add HBase classpath dependencies when available (Nick Dimiduk via Ashutosh Chauhan) [hashutosh] HIVE-4632 : Use hadoop counter as a stat publisher (Navis via Ashutosh Chauhan) Changes for Build #561 [hashutosh] HIVE-5845 : CTAS failed on vectorized code path (Remus Rusanu via Ashutosh Chauhan) [thejas] HIVE-5635 : WebHCatJTShim23 ignores security/user context (Eugene Koifman via Thejas Nair) [hashutosh] HIVE-5663 : Refactor ORC RecordReader to operate on direct wrapped ByteBuffers (Gopal V via Owen Omalley) [xuefu] HIVE-5565: Limit Hive decimal type maximum precision and scale to 38 (reviewed by Brock) [brock] HIVE-5842 - Fix issues with new paths to jar in hcatalog (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #562 [hashutosh] HIVE-5692 : Make VectorGroupByOperator parameters configurable (Remus Rusanu via Ashutosh Chauhan) Changes for Build #563 [thejas] HIVE-5618 : Hive local task fails to run when run from oozie in a secure cluster (Prasad Mujumdar via Thejas Nair) Changes for Build #564 Changes for Build #565 [thejas] HIVE-3815 : hive table rename fails if filesystem cache is disabled (Thejas Nair reviewed by Navis) Changes for Build #566 Changes for Build #567 [hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having clause (Harish Butani via Ashutosh Chauhan) Changes for Build #568 [xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced parenthesises (reviewed by Ashutosh) Changes for Build #569 Changes for Build #570 [rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the absence of any column statistics (Prasanth Jayachandran via Harish Butani) [hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis via Ashutosh Chauhan) Changes for Build #571 [navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.) [navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu Zhang via Navis) [navis] HIVE-4518 : Missing file (HiveFatalException) [navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and Jason Dere via Navis) Changes for Build #572 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #573 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #574 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #575 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #576 Changes for Build #577 Changes for Build #578 Changes for Build #579 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #580 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #581 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #582 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #583 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #584 [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey
Hive-trunk-h0.21 - Build # 2496 - Still Failing
Changes for Build #2461 [xuefu] HIVE-5565: Limit Hive decimal type maximum precision and scale to 38 (reviewed by Brock) [brock] HIVE-5842 - Fix issues with new paths to jar in hcatalog (Brock Noland reviewed by Prasad Mujumdar) [xuefu] HIVE-5356: Move arithmatic UDFs to generic UDF implementations (reviewed by Brock) Changes for Build #2462 [hashutosh] HIVE-5692 : Make VectorGroupByOperator parameters configurable (Remus Rusanu via Ashutosh Chauhan) [hashutosh] HIVE-5845 : CTAS failed on vectorized code path (Remus Rusanu via Ashutosh Chauhan) [thejas] HIVE-5635 : WebHCatJTShim23 ignores security/user context (Eugene Koifman via Thejas Nair) [hashutosh] HIVE-5663 : Refactor ORC RecordReader to operate on direct wrapped ByteBuffers (Gopal V via Owen Omalley) Changes for Build #2463 Changes for Build #2464 [thejas] HIVE-5618 : Hive local task fails to run when run from oozie in a secure cluster (Prasad Mujumdar via Thejas Nair) Changes for Build #2465 Changes for Build #2466 [thejas] HIVE-3815 : hive table rename fails if filesystem cache is disabled (Thejas Nair reviewed by Navis) Changes for Build #2467 Changes for Build #2468 [hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having clause (Harish Butani via Ashutosh Chauhan) Changes for Build #2469 [xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced parenthesises (reviewed by Ashutosh) Changes for Build #2470 Changes for Build #2471 [rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the absence of any column statistics (Prasanth Jayachandran via Harish Butani) [hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis via Ashutosh Chauhan) Changes for Build #2472 [navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.) [navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu Zhang via Navis) [navis] HIVE-4518 : Missing file (HiveFatalException) [navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and Jason Dere via Navis) Changes for Build #2473 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #2474 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #2475 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #2476 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #2477 Changes for Build #2478 Changes for Build #2479 Changes for Build #2480 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #2481 Changes for Build #2482 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #2483 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #2484 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #2485 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #2486 [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis)
[jira] [Commented] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification
[ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847166#comment-13847166 ] Hive QA commented on HIVE-1466: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618535/HIVE-1466.2.patch {color:green}SUCCESS:{color} +1 4788 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/627/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/627/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618535 Add NULL DEFINED AS to ROW FORMAT specification --- Key: HIVE-1466 URL: https://issues.apache.org/jira/browse/HIVE-1466 Project: Hive Issue Type: Improvement Reporter: Adam Kramer Assignee: Prasad Mujumdar Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Review Request 16239: HIVE-6022 Load statements with incorrect order of partitions put input files to unreadable places
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16239/ --- Review request for hive. Bugs: HIVE-6022 https://issues.apache.org/jira/browse/HIVE-6022 Repository: hive-git Description --- HIVE-6022 Load statements with incorrect order of partitions put input files to unreadable places Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 4b7fc73 ql/src/test/queries/clientpositive/loadpart2.q PRE-CREATION ql/src/test/results/clientpositive/loadpart2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16239/diff/ Testing --- Thanks, Teruyoshi Zenmyo