Re: Review Request 16184: Hive should be able to skip header and footer rows when reading data file for a table (HIVE-5795)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16184/#review30523 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/16184/#comment58465 skipHeader and initiaizeFooterBuf can be moved to a common util class and shared. We just need to pass the member variables as additional params ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/16184/#comment58463 code such as this block for parsing header count can be moved to a util class and shared between the two places ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/16184/#comment58464 code such as this block for parsing header count can be moved to a util class and shared between the two places ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java https://reviews.apache.org/r/16184/#comment58466 logic of this block also looks same in 2 places, can we move it to a common util function ? - Thejas Nair On Dec. 11, 2013, 9:19 p.m., Shuaishuai Nie wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16184/ --- (Updated Dec. 11, 2013, 9:19 p.m.) Review request for hive, Eric Hanson and Thejas Nair. Bugs: hive-5795 https://issues.apache.org/jira/browse/hive-5795 Repository: hive-git Description --- Hive should be able to skip header and footer rows when reading data file for a table (follow up with review https://reviews.apache.org/r/15663/diff/#index_header) Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 conf/hive-default.xml.template c61a0bb data/files/header_footer_table_1/0001.txt PRE-CREATION data/files/header_footer_table_1/0002.txt PRE-CREATION data/files/header_footer_table_1/0003.txt PRE-CREATION data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION itests/qtest/pom.xml c3cbb89 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java d2b2526 ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java dd5cb6b ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 974a5d6 ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java 85dd975 ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 0686d9b ql/src/test/queries/clientnegative/file_with_header_footer_negative.q PRE-CREATION ql/src/test/queries/clientpositive/file_with_header_footer.q PRE-CREATION ql/src/test/results/clientnegative/file_with_header_footer_negative.q.out PRE-CREATION ql/src/test/results/clientpositive/file_with_header_footer.q.out PRE-CREATION serde/if/serde.thrift 2ceb572 serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 22a6168 Diff: https://reviews.apache.org/r/16184/diff/ Testing --- Thanks, Shuaishuai Nie
[jira] [Commented] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task
[ https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850246#comment-13850246 ] Sun Rui commented on HIVE-5891: --- [~yhuai] I think we can leave $INTNAME as is for this issue. Do you have any further comments? if no, I can prepare a new patch for review. Alias conflict when merging multiple mapjoin tasks into their common child mapred task -- Key: HIVE-5891 URL: https://issues.apache.org/jira/browse/HIVE-5891 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-5891.1.patch Use the following test case with HIVE 0.12: {quote} create table src(key int, value string); load data local inpath 'src/data/files/kv1.txt' overwrite into table src; select * from ( select c.key from (select a.key from src a join src b on a.key=b.key group by a.key) tmp join src c on tmp.key=c.key union all select c.key from (select a.key from src a join src b on a.key=b.key group by a.key) tmp join src c on tmp.key=c.key ) x; {quote} We will get a NullPointerException from Union Operator: {quote} java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:0} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:0} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) ... 4 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) ... 5 more {quote} The root cause is in CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask(). +--+ +--+ | MapJoin task | | MapJoin task | +--+ +--+ \ / \ / +--+ | Union task | +--+ CommonJoinTaskDispatcher merges the two MapJoin tasks into their common child: Union task. The two MapJoin tasks have the same alias name for their big tables: $INTNAME, which is the name of the temporary table of a join stream. The aliasToWork map uses alias as key, so eventually only the MapJoin operator tree of one MapJoin task is saved into the aliasToWork map of the Union task, while the MapJoin operator tree of another MapJoin task is lost. As a result, Union operator won't be initialized because not all of its parents gets intialized (The Union operator itself indicates it has two parents, but actually it has only 1 parent because another parent is lost). This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 0.12. The propsed solution is to use the query ID as prefix for the join stream name to avoid conflict and add sanity check code in
[jira] [Created] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
Adrian Popescu created HIVE-6041: Summary: Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Popescu updated HIVE-6041: - Description: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. was: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Popescu updated HIVE-6041: - Description: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. was: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Popescu updated HIVE-6041: - Description: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. was: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all tasks following the common join are filtered out. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Popescu updated HIVE-6041: - Description: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. was: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Popescu updated HIVE-6041: - Description: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. was: The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6042) With Dynamic partitioning, All partitions can not be overwrited
ruish li created HIVE-6042: -- Summary: With Dynamic partitioning, All partitions can not be overwrited Key: HIVE-6042 URL: https://issues.apache.org/jira/browse/HIVE-6042 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.12.0 Environment: OS: Red Hat Enterprise Linux Server release 6.2 HDFS: CDH-4.2.1 MAPRED: CDH-4.2.1-mr1 Reporter: ruish li Priority: Minor step1: create table drop table if exists t; create table t(a int)PARTITIONED BY (city_ string); step2: insert data (table dual has only one value: ‘x’) set hive.exec.dynamic.partition.mode=nonstrict; insert into table t partition(city_) select 1,'beijing' from dual; insert into table t partition(city_) select 2,'shanghai' from dual; hive (default) select * from t; 1 beijing 2 shanghai step3: overwrite table ,we can show that insert overwrite table t partition(city_) select 3,'beijing' from dual; hive (default) select * from t; 1 beijing 2 shanghai here we can see the partition city_=shanghai exist yet,But we hope that this partition is covered With Dynamic partitioning. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6042) With Dynamic partitioning, All partitions can not be overwrited
[ https://issues.apache.org/jira/browse/HIVE-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ruish li updated HIVE-6042: --- Description: step1: create table drop table if exists t; create table t(a int)PARTITIONED BY (city_ string); step2: insert data (table dual has only one value: ‘x’) set hive.exec.dynamic.partition.mode=nonstrict; insert into table t partition(city_) select 1,'beijing' from dual; insert into table t partition(city_) select 2,'shanghai' from dual; hive (default) select * from t; 1 beijing 2 shanghai step3: overwrite table insert overwrite table t partition(city_) select 3,'beijing' from dual; hive (default) select * from t; 1 beijing 2 shanghai here we can see the partition city_=shanghai exist yet,But we hope that this partition is covered With Dynamic partitioning. was: step1: create table drop table if exists t; create table t(a int)PARTITIONED BY (city_ string); step2: insert data (table dual has only one value: ‘x’) set hive.exec.dynamic.partition.mode=nonstrict; insert into table t partition(city_) select 1,'beijing' from dual; insert into table t partition(city_) select 2,'shanghai' from dual; hive (default) select * from t; 1 beijing 2 shanghai step3: overwrite table ,we can show that insert overwrite table t partition(city_) select 3,'beijing' from dual; hive (default) select * from t; 1 beijing 2 shanghai here we can see the partition city_=shanghai exist yet,But we hope that this partition is covered With Dynamic partitioning. With Dynamic partitioning, All partitions can not be overwrited --- Key: HIVE-6042 URL: https://issues.apache.org/jira/browse/HIVE-6042 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.12.0 Environment: OS: Red Hat Enterprise Linux Server release 6.2 HDFS: CDH-4.2.1 MAPRED: CDH-4.2.1-mr1 Reporter: ruish li Priority: Minor step1: create table drop table if exists t; create table t(a int)PARTITIONED BY (city_ string); step2: insert data (table dual has only one value: ‘x’) set hive.exec.dynamic.partition.mode=nonstrict; insert into table t partition(city_) select 1,'beijing' from dual; insert into table t partition(city_) select 2,'shanghai' from dual; hive (default) select * from t; 1 beijing 2 shanghai step3: overwrite table insert overwrite table t partition(city_) select 3,'beijing' from dual; hive (default) select * from t; 1 beijing 2 shanghai here we can see the partition city_=shanghai exist yet,But we hope that this partition is covered With Dynamic partitioning. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6043) Document incompatible changes in Hive 0.12 and trunk
[ https://issues.apache.org/jira/browse/HIVE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6043: --- Summary: Document incompatible changes in Hive 0.12 and trunk (was: Document incompatible changes) Document incompatible changes in Hive 0.12 and trunk Key: HIVE-6043 URL: https://issues.apache.org/jira/browse/HIVE-6043 Project: Hive Issue Type: Task Reporter: Brock Noland Priority: Blocker We need to document incompatible changes. For example * HIVE-5372 changed object inspector hierarchy breaking most if not all custom serdes * HIVE-1511/HIVE-5263 serializes ObjectInspectors with Kryo so all custom serdes (fixed by HIVE-5380) * Hive 0.12 separates MapredWork into MapWork and ReduceWork which is used by Serdes * HIVE-5411 serializes expressions with Kryo which are used by custom serdes -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6043) Document incompatible changes
Brock Noland created HIVE-6043: -- Summary: Document incompatible changes Key: HIVE-6043 URL: https://issues.apache.org/jira/browse/HIVE-6043 Project: Hive Issue Type: Task Reporter: Brock Noland Priority: Blocker We need to document incompatible changes. For example * HIVE-5372 changed object inspector hierarchy breaking most if not all custom serdes * HIVE-1511/HIVE-5263 serializes ObjectInspectors with Kryo so all custom serdes (fixed by HIVE-5380) * Hive 0.12 separates MapredWork into MapWork and ReduceWork which is used by Serdes * HIVE-5411 serializes expressions with Kryo which are used by custom serdes -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5380) Non-default OI constructors should be supported if for backwards compatibility
[ https://issues.apache.org/jira/browse/HIVE-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5380: --- Attachment: HIVE-5380.patch [~xuefuz], can you take a look at this? Non-default OI constructors should be supported if for backwards compatibility -- Key: HIVE-5380 URL: https://issues.apache.org/jira/browse/HIVE-5380 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-5380.patch, HIVE-5380.patch In HIVE-5263 we started serializing OI's when cloning the plan. This was a great boost in speed for many queries. In the future we'd like to stop copying the OI's, perhaps in HIVE-4396. Until then Custom Serdes will not work on trunk. This is a fix to allow custom serdes such as the Hive JSon Serde work until we address the fact we don't want to have to copy the OI's. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5380) Non-default OI constructors should be supported if for backwards compatibility
[ https://issues.apache.org/jira/browse/HIVE-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850543#comment-13850543 ] Brock Noland commented on HIVE-5380: Uploaded new patch base don kyro-2.22. Non-default OI constructors should be supported if for backwards compatibility -- Key: HIVE-5380 URL: https://issues.apache.org/jira/browse/HIVE-5380 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-5380.patch, HIVE-5380.patch In HIVE-5263 we started serializing OI's when cloning the plan. This was a great boost in speed for many queries. In the future we'd like to stop copying the OI's, perhaps in HIVE-4396. Until then Custom Serdes will not work on trunk. This is a fix to allow custom serdes such as the Hive JSon Serde work until we address the fact we don't want to have to copy the OI's. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Incompatible Changes affecting Serdes and UDFS
Hi, Hive 0.12 made some incompatible changes which impacts Serdes and it appears 0.13 makes more incompatible changes. I created HIVE-6043 to track this, if you know of any more changes than what is described there, please do add them. Thanks! Brock
[jira] [Updated] (HIVE-6029) Add default authorization on database/table creation
[ https://issues.apache.org/jira/browse/HIVE-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6029: --- Status: Patch Available (was: Open) Submitting patch for testing. Add default authorization on database/table creation Key: HIVE-6029 URL: https://issues.apache.org/jira/browse/HIVE-6029 Project: Hive Issue Type: Improvement Components: Authorization, Metastore Affects Versions: 0.10.0 Reporter: Chris Drome Assignee: Chris Drome Priority: Minor Attachments: HIVE-6029-1.patch.txt, HIVE-6029.2.patch Default authorization privileges are not set when a database/table is created. This allows a user to create a database/table and not be able to access it through Sentry. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6029) Add default authorization on database/table creation
[ https://issues.apache.org/jira/browse/HIVE-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6029: --- Attachment: HIVE-6029.2.patch [~cdrome] I rebased the patch on trunk. How does it look? Add default authorization on database/table creation Key: HIVE-6029 URL: https://issues.apache.org/jira/browse/HIVE-6029 Project: Hive Issue Type: Improvement Components: Authorization, Metastore Affects Versions: 0.10.0 Reporter: Chris Drome Assignee: Chris Drome Priority: Minor Attachments: HIVE-6029-1.patch.txt, HIVE-6029.2.patch Default authorization privileges are not set when a database/table is created. This allows a user to create a database/table and not be able to access it through Sentry. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850557#comment-13850557 ] Brock Noland commented on HIVE-5783: Thanks Resmus for creating HIVE-5998. Eric, I think the current patch is stale since it's been decided the Parquet Serde will contributed to Hive. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5812) HiveServer2 SSL connection transport binds to loopback address by default
[ https://issues.apache.org/jira/browse/HIVE-5812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5812: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Thank you for the contribution Prasad! I have committed this to trunk. HiveServer2 SSL connection transport binds to loopback address by default - Key: HIVE-5812 URL: https://issues.apache.org/jira/browse/HIVE-5812 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.13.0 Attachments: HIVE-5812.1.patch, HIVE-5812.2.patch The secure socket transport implemented as part of HIVE-5351, binds to loopback address by default. If the bind interface gets used only if its explicitly defined in the hive-site or via environment. This behavior should be same as non-SSL transport. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5928) Add a hive authorization plugin api that does not assume privileges needed
[ https://issues.apache.org/jira/browse/HIVE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850574#comment-13850574 ] Brock Noland commented on HIVE-5928: bq. interface HiveBaseAuthorizationProvider bq. There will be a subclass of HiveBaseAuthorizationProvider Since it doesn't look like we have implemented here...may I interject some thoughts? I think we should start moving hive development from inheritance to composition where possible[1]. This looks like a great place to start. [1] http://en.wikipedia.org/wiki/Composition_over_inheritance Add a hive authorization plugin api that does not assume privileges needed -- Key: HIVE-5928 URL: https://issues.apache.org/jira/browse/HIVE-5928 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Original Estimate: 120h Remaining Estimate: 120h The existing HiveAuthorizationProvider interface implementations can be used to support custom authorization models. But this interface limits the customization for these reasons - 1. It has assumptions about the privileges required for an action. 2. It does have not functions that you can implement for having custom ways of doing the actions of access control statements. This jira proposes a new interface HiveBaseAuthorizationProvider that does not make assumptions of the privileges required for the actions. The authorize() functions will be equivalent of authorize(hive object, action). It will also have functions that will be called from the access control statements. The current HiveAuthorizationProvider will continue to be supported for backward compatibility. There will be a subclass of HiveBaseAuthorizationProvider that executes actions using this interface. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-4887) hive should have an option to disable non sql commands that impose security risk
[ https://issues.apache.org/jira/browse/HIVE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850583#comment-13850583 ] Brock Noland commented on HIVE-4887: bq. It should be possible to to disable create function as well. I would kindly suggest the following: 1) have a whitelist of UDFs which can be used when authorization is enabled as some UDFs are insecure by default - java_method() or transform(). 2) Add a URI privilege where admin's can give users permission to vetted jars. Then when someone creates a UDF you can verify the class exists in a jar they privilege to access. hive should have an option to disable non sql commands that impose security risk Key: HIVE-4887 URL: https://issues.apache.org/jira/browse/HIVE-4887 Project: Hive Issue Type: Sub-task Components: Authorization, Security Reporter: Thejas M Nair Original Estimate: 72h Remaining Estimate: 72h Hive's RDBMS style of authorization (using grant/revoke), relies on all data access being done through hive select queries. But hive also supports running dfs commands, shell commands (eg !cat file), and shell commands through hive streaming. This creates problems in securing a hive server using this authorization model. UDF is another way to write custom code that can compromise security, but you can control that by restricting access to users to be only through jdbc connection to hive server (2). (note that there are other major problems such as this one - HIVE-3271) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5837) SQL standard based secure authorization for hive
[ https://issues.apache.org/jira/browse/HIVE-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850586#comment-13850586 ] Brock Noland commented on HIVE-5837: [~thejas], as I mentioned [here|https://issues.apache.org/jira/browse/HIVE-4887?focusedCommentId=13850583page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13850583] I would consider adding a URI privilege to the model described here. This allows the use of custom UDFs for users. Beyond that I think a SERVER privilege should be added as well. The reason I believe a server privilege is useful is because large deployments of Hive would like to take advantage of multiple HS2 instances while allowing users to only access a single instance. What are you thoughts on these topics? SQL standard based secure authorization for hive Key: HIVE-5837 URL: https://issues.apache.org/jira/browse/HIVE-5837 Project: Hive Issue Type: New Feature Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: SQL standard authorization hive.pdf The current default authorization is incomplete and not secure. The alternative of storage based authorization provides security but does not provide fine grained authorization. The proposal is to support secure fine grained authorization in hive using SQL standard based authorization model. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5380) Non-default OI constructors should be supported if for backwards compatibility
[ https://issues.apache.org/jira/browse/HIVE-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850627#comment-13850627 ] Hive QA commented on HIVE-5380: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619113/HIVE-5380.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4789 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/667/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/667/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619113 Non-default OI constructors should be supported if for backwards compatibility -- Key: HIVE-5380 URL: https://issues.apache.org/jira/browse/HIVE-5380 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-5380.patch, HIVE-5380.patch In HIVE-5263 we started serializing OI's when cloning the plan. This was a great boost in speed for many queries. In the future we'd like to stop copying the OI's, perhaps in HIVE-4396. Until then Custom Serdes will not work on trunk. This is a fix to allow custom serdes such as the Hive JSon Serde work until we address the fact we don't want to have to copy the OI's. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-h0.21 - Build # 2508 - Still Failing
Changes for Build #2472 [navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.) [navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu Zhang via Navis) [navis] HIVE-4518 : Missing file (HiveFatalException) [navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and Jason Dere via Navis) Changes for Build #2473 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #2474 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #2475 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #2476 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #2477 Changes for Build #2478 Changes for Build #2479 Changes for Build #2480 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #2481 Changes for Build #2482 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #2483 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #2484 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #2485 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #2486 [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #2487 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) Changes for Build #2488 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #2489 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #2490 Changes for Build #2491 Changes for Build #2492 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #2493 [xuefu] HIVE-5872:
Hive-trunk-hadoop2 - Build # 607 - Still Failing
Changes for Build #571 [navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.) [navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu Zhang via Navis) [navis] HIVE-4518 : Missing file (HiveFatalException) [navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and Jason Dere via Navis) Changes for Build #572 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #573 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #574 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #575 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #576 Changes for Build #577 Changes for Build #578 Changes for Build #579 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #580 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #581 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #582 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #583 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #584 [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #585 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) Changes for Build #586 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #587 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #588 Changes for Build #589 Changes for Build #590 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #591 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report
[jira] [Commented] (HIVE-6029) Add default authorization on database/table creation
[ https://issues.apache.org/jira/browse/HIVE-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850685#comment-13850685 ] Hive QA commented on HIVE-6029: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619117/HIVE-6029.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4789 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.security.TestClientSideAuthorizationProvider.testSimplePrivileges org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/668/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/668/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619117 Add default authorization on database/table creation Key: HIVE-6029 URL: https://issues.apache.org/jira/browse/HIVE-6029 Project: Hive Issue Type: Improvement Components: Authorization, Metastore Affects Versions: 0.10.0 Reporter: Chris Drome Assignee: Chris Drome Priority: Minor Attachments: HIVE-6029-1.patch.txt, HIVE-6029.2.patch Default authorization privileges are not set when a database/table is created. This allows a user to create a database/table and not be able to access it through Sentry. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5380) Non-default OI constructors should be supported for backwards compatibility
[ https://issues.apache.org/jira/browse/HIVE-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5380: --- Summary: Non-default OI constructors should be supported for backwards compatibility (was: Non-default OI constructors should be supported if for backwards compatibility) Non-default OI constructors should be supported for backwards compatibility --- Key: HIVE-5380 URL: https://issues.apache.org/jira/browse/HIVE-5380 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-5380.patch, HIVE-5380.patch In HIVE-5263 we started serializing OI's when cloning the plan. This was a great boost in speed for many queries. In the future we'd like to stop copying the OI's, perhaps in HIVE-4396. Until then Custom Serdes will not work on trunk. This is a fix to allow custom serdes such as the Hive JSon Serde work until we address the fact we don't want to have to copy the OI's. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5380) Non-default OI constructors should be supported for backwards compatibility
[ https://issues.apache.org/jira/browse/HIVE-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850731#comment-13850731 ] Xuefu Zhang commented on HIVE-5380: --- +1 Non-default OI constructors should be supported for backwards compatibility --- Key: HIVE-5380 URL: https://issues.apache.org/jira/browse/HIVE-5380 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-5380.patch, HIVE-5380.patch In HIVE-5263 we started serializing OI's when cloning the plan. This was a great boost in speed for many queries. In the future we'd like to stop copying the OI's, perhaps in HIVE-4396. Until then Custom Serdes will not work on trunk. This is a fix to allow custom serdes such as the Hive JSon Serde work until we address the fact we don't want to have to copy the OI's. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations
[ https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850733#comment-13850733 ] Xuefu Zhang commented on HIVE-6021: --- +1, patch looks good to me. Problem in GroupByOperator for handling distinct aggrgations Key: HIVE-6021 URL: https://issues.apache.org/jira/browse/HIVE-6021 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-6021.1.patch, HIVE-6021.2.patch Use the following test case with HIVE 0.12: {code:sql} create table src(key int, value string); load data local inpath 'src/data/files/kv1.txt' overwrite into table src; set hive.map.aggr=false; select count(key),count(distinct value) from src group by key; {code} We will get an ArrayIndexOutOfBoundsException from GroupByOperator: {code} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159) ... 10 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152) ... 10 more {code} explain select count(key),count(distinct value) from src group by key; {code} STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: src TableScan alias: src Select Operator expressions: expr: key type: int expr: value type: string outputColumnNames: key, value Reduce Output Operator key expressions: expr: key type: int expr: value type: string sort order: ++ Map-reduce partition columns: expr: key type: int tag: -1 Reduce Operator Tree: Group By Operator aggregations: expr: count(KEY._col0) // The parameter causes this problem ^^^ expr: count(DISTINCT KEY._col1:0._col0) bucketGroup: false keys: expr: KEY._col0 type: int mode: complete outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {code} The root cause is within GroupByOperator.initializeOp(). The method forgets to handle the case: For a query has distinct aggregations, there is an aggregation function has a parameter which is a groupby key column but not distinct key column. {code} if (unionExprEval != null) { String[] names = parameters.get(j).getExprString().split(\\.); // parameters of the form : KEY.colx:t.coly if
[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850867#comment-13850867 ] Kostiantyn Kudriavtsev commented on HIVE-3454: -- Hi there, when is this patch going to be applied to trunk? it seems it's enough important issue to be included Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0 Reporter: Ryan Harris Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-4887) hive should have an option to disable non sql commands that impose security risk
[ https://issues.apache.org/jira/browse/HIVE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850866#comment-13850866 ] Thejas M Nair commented on HIVE-4887: - [~brocknoland] thanks for the suggestions. That makes sense. Along with 'add jar' privilege for URI , another complimentary approach would be to support a concept of permanent (blessed) udfs, that an admin can add and would be pre-registered for all users. hive should have an option to disable non sql commands that impose security risk Key: HIVE-4887 URL: https://issues.apache.org/jira/browse/HIVE-4887 Project: Hive Issue Type: Sub-task Components: Authorization, Security Reporter: Thejas M Nair Original Estimate: 72h Remaining Estimate: 72h Hive's RDBMS style of authorization (using grant/revoke), relies on all data access being done through hive select queries. But hive also supports running dfs commands, shell commands (eg !cat file), and shell commands through hive streaming. This creates problems in securing a hive server using this authorization model. UDF is another way to write custom code that can compromise security, but you can control that by restricting access to users to be only through jdbc connection to hive server (2). (note that there are other major problems such as this one - HIVE-3271) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6044) webhcat should be able to return detailed serde information when show table using fromat=extended
[ https://issues.apache.org/jira/browse/HIVE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6044: Status: Patch Available (was: Open) webhcat should be able to return detailed serde information when show table using fromat=extended --- Key: HIVE-6044 URL: https://issues.apache.org/jira/browse/HIVE-6044 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-6044.1.patch Now in webhcat, when using GET ddl/database/:db/table/:table and format=extended, return value is based on query show table extended like. However, this query doesn't contains serde info like line.delim and filed.delim. In this case, user won't have enough information to reconstruct the exact same table based on the information from the json file. The descExtendedTable function in HcatDelegator should also return extra fields from query desc extended tablename which contains fields sd, retention, parameters parametersSize and tableType. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6047) Permanent UDFs in Hive
Jason Dere created HIVE-6047: Summary: Permanent UDFs in Hive Key: HIVE-6047 URL: https://issues.apache.org/jira/browse/HIVE-6047 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Currently Hive only supports temporary UDFs which must be re-registered when starting up a Hive session. Provide some support to register permanent UDFs with Hive. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6046) add UDF for converting date time from one presentation to another
[ https://issues.apache.org/jira/browse/HIVE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850877#comment-13850877 ] Kostiantyn Kudriavtsev commented on HIVE-6046: -- just start working on that, your comments are welcomed add UDF for converting date time from one presentation to another -- Key: HIVE-6046 URL: https://issues.apache.org/jira/browse/HIVE-6046 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor it'd be nice to have function for converting datetime to different formats, for example: format_date('2013-12-12 00:00:00.0', '-MM-dd HH:mm:ss.S', '/MM/dd') There are two signatures to facilitate further using: format_date(datetime, fromFormat, toFormat) format_date(timestamp, toFormat) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5936: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Fix For: 0.13.0 Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, HIVE-5936.11.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850880#comment-13850880 ] Kostiantyn Kudriavtsev commented on HIVE-6006: -- patch is available, Could please somebody put the patch on ReviewBoard? That's make it easier to look at by interested people Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Attachments: hive-6006.patch Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5837) SQL standard based secure authorization for hive
[ https://issues.apache.org/jira/browse/HIVE-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850885#comment-13850885 ] Thejas M Nair commented on HIVE-5837: - [~brocknoland] I assume you mean URI and SERVER as objects (similar to table, views etc) on which privileges (eg, select , insert,..) can be granted. As you know, URI authorization is very essential (more than just helping with udf support), without that you cannot enforce access control (you can use 'create table' to read from any hdfs location). I see that SERVER object will also be useful, but not essential for a first version. Should we make one of the sql standard privileges available on SERVER object ? SQL standard based secure authorization for hive Key: HIVE-5837 URL: https://issues.apache.org/jira/browse/HIVE-5837 Project: Hive Issue Type: New Feature Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: SQL standard authorization hive.pdf The current default authorization is incomplete and not secure. The alternative of storage based authorization provides security but does not provide fine grained authorization. The proposal is to support secure fine grained authorization in hive using SQL standard based authorization model. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Attachment: hive-6006.patch Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Attachments: hive-6006.patch Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5837) SQL standard based secure authorization for hive
[ https://issues.apache.org/jira/browse/HIVE-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850887#comment-13850887 ] Thejas M Nair commented on HIVE-5837: - [~brocknoland] Thanks for your feedback in the jiras for SQL standard auth ! SQL standard based secure authorization for hive Key: HIVE-5837 URL: https://issues.apache.org/jira/browse/HIVE-5837 Project: Hive Issue Type: New Feature Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: SQL standard authorization hive.pdf The current default authorization is incomplete and not secure. The alternative of storage based authorization provides security but does not provide fine grained authorization. The proposal is to support secure fine grained authorization in hive using SQL standard based authorization model. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Status: Patch Available (was: Open) hive-6006.patch has been attached Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Attachments: hive-6006.patch Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6046) add UDF for converting date time from one presentation to another
Kostiantyn Kudriavtsev created HIVE-6046: Summary: add UDF for converting date time from one presentation to another Key: HIVE-6046 URL: https://issues.apache.org/jira/browse/HIVE-6046 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor it'd be nice to have function for converting datetime to different formats, for example: format_date('2013-12-12 00:00:00.0', '-MM-dd HH:mm:ss.S', '/MM/dd') There are two signatures to facilitate further using: format_date(datetime, fromFormat, toFormat) format_date(timestamp, toFormat) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5837) SQL standard based secure authorization for hive
[ https://issues.apache.org/jira/browse/HIVE-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850915#comment-13850915 ] Alan Gates commented on HIVE-5837: -- Brock, could you give more details on the SERVER use case? I've seen people use multiple instances of HS2 for HA/scaling, but never allocating some users to some instances and others to others. What's the motivation for that? SQL standard based secure authorization for hive Key: HIVE-5837 URL: https://issues.apache.org/jira/browse/HIVE-5837 Project: Hive Issue Type: New Feature Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: SQL standard authorization hive.pdf The current default authorization is incomplete and not secure. The alternative of storage based authorization provides security but does not provide fine grained authorization. The proposal is to support secure fine grained authorization in hive using SQL standard based authorization model. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-h0.21 - Build # 2509 - Still Failing
Changes for Build #2473 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #2474 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #2475 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #2476 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #2477 Changes for Build #2478 Changes for Build #2479 Changes for Build #2480 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #2481 Changes for Build #2482 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #2483 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #2484 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #2485 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #2486 [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #2487 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) Changes for Build #2488 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #2489 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #2490 Changes for Build #2491 Changes for Build #2492 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #2493 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types (reviewed by Sergey Shelukhin) [hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath Pandey via Ashutosh Chauhan) [hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery contains nulls in matching column (Harish Butani
[jira] [Commented] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850928#comment-13850928 ] Hive QA commented on HIVE-6006: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619157/hive-6006.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4792 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/669/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/669/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619157 Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Attachments: hive-6006.patch Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Review Request 16328: HIVE-5992: Hive inconsistently converts timestamp in AVG and SUM UDAF's
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16328/ --- Review request for hive and Prasad Mujumdar. Bugs: HIVE-5992 https://issues.apache.org/jira/browse/HIVE-5992 Repository: hive-git Description --- The fix is to make the two UDAFs report convert timestamp to double in terms of seconds and the fraction of the second. Test is added to cover the case. Diffs - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 41d5efd ql/src/test/queries/clientpositive/timestamp_3.q e5a4345 ql/src/test/results/clientpositive/timestamp_3.q.out 8544307 Diff: https://reviews.apache.org/r/16328/diff/ Testing --- Unit test. New unit test. Regression suite. Thanks, Xuefu Zhang
[jira] [Commented] (HIVE-6044) webhcat should be able to return detailed serde information when show table using fromat=extended
[ https://issues.apache.org/jira/browse/HIVE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850932#comment-13850932 ] Hive QA commented on HIVE-6044: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619140/HIVE-6044.1.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/670/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/670/console Messages: {noformat} This message was trimmed, see log for full details [INFO] [INFO] [INFO] Building Hive HCatalog Server Extensions 0.13.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-hcatalog-server-extensions --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ hive-hcatalog-server-extensions --- [debug] execute contextualize [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/src/main/resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-hcatalog-server-extensions --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-hcatalog-server-extensions --- [INFO] Compiling 38 source files to /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/target/classes [WARNING] Note: Some input files use or override a deprecated API. [WARNING] Note: Recompile with -Xlint:deprecation for details. [WARNING] Note: Some input files use unchecked or unsafe operations. [WARNING] Note: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ hive-hcatalog-server-extensions --- [debug] execute contextualize [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/src/test/resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hcatalog-server-extensions --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/target/tmp/conf [copy] Copying 4 files to /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hcatalog-server-extensions --- [INFO] Compiling 4 source files to /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/target/test-classes [WARNING] Note: Some input files use or override a deprecated API. [WARNING] Note: Recompile with -Xlint:deprecation for details. [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hcatalog-server-extensions --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hcatalog-server-extensions --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/target/hive-hcatalog-server-extensions-0.13.0-SNAPSHOT.jar [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hcatalog-server-extensions --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/target/hive-hcatalog-server-extensions-0.13.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hcatalog/hive-hcatalog-server-extensions/0.13.0-SNAPSHOT/hive-hcatalog-server-extensions-0.13.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hcatalog/server-extensions/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hcatalog/hive-hcatalog-server-extensions/0.13.0-SNAPSHOT/hive-hcatalog-server-extensions-0.13.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive HCatalog Webhcat Java Client 0.13.0-SNAPSHOT [INFO] [INFO] [INFO] ---
[jira] [Commented] (HIVE-6047) Permanent UDFs in Hive
[ https://issues.apache.org/jira/browse/HIVE-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850921#comment-13850921 ] Eric Hanson commented on HIVE-6047: --- Vectorized execution works with temporary UDFs through an adaptor. If you could verify that permanent UDFs added by users also work in vectorized mode with that adaptor, that'd be great. Permanent UDFs in Hive -- Key: HIVE-6047 URL: https://issues.apache.org/jira/browse/HIVE-6047 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Currently Hive only supports temporary UDFs which must be re-registered when starting up a Hive session. Provide some support to register permanent UDFs with Hive. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6043) Document incompatible changes in Hive 0.12 and trunk
[ https://issues.apache.org/jira/browse/HIVE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850810#comment-13850810 ] Sergey Shelukhin commented on HIVE-6043: HIVE-4914? It does have backward compat Document incompatible changes in Hive 0.12 and trunk Key: HIVE-6043 URL: https://issues.apache.org/jira/browse/HIVE-6043 Project: Hive Issue Type: Task Reporter: Brock Noland Priority: Blocker We need to document incompatible changes. For example * HIVE-5372 changed object inspector hierarchy breaking most if not all custom serdes * HIVE-1511/HIVE-5263 serializes ObjectInspectors with Kryo so all custom serdes (fixed by HIVE-5380) * Hive 0.12 separates MapredWork into MapWork and ReduceWork which is used by Serdes * HIVE-5411 serializes expressions with Kryo which are used by custom serdes * HIVE-4827 removed the flag of hive.optimize.mapjoin.mapreduce (This flag was introduced in Hive 0.11 by HIVE-3952). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Review Request 16329: HIVE-6039: Round, AVG and SUM functions reject char/varch input while accepting string input
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16329/ --- Review request for hive and Prasad Mujumdar. Bugs: HIVE-6039 https://issues.apache.org/jira/browse/HIVE-6039 Repository: hive-git Description --- Allow input to these UDFs for char and varchar. Diffs - data/files/char_varchar_udf.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 4b219bd ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 41d5efd ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRound.java fc9c1b2 ql/src/test/queries/clientpositive/char_varchar_udf.q PRE-CREATION ql/src/test/results/clientpositive/char_varchar_udf.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16329/diff/ Testing --- Unit tested. New test added. Test suite passed. Thanks, Xuefu Zhang
[jira] [Created] (HIVE-6044) webhcat should be able to return detailed serde information when show table using fromat=extended
Shuaishuai Nie created HIVE-6044: Summary: webhcat should be able to return detailed serde information when show table using fromat=extended Key: HIVE-6044 URL: https://issues.apache.org/jira/browse/HIVE-6044 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Now in webhcat, when using GET ddl/database/:db/table/:table and format=extended, return value is based on query show table extended like. However, this query doesn't contains serde info like line.delim and filed.delim. In this case, user won't have enough information to reconstruct the exact same table based on the information from the json file. The descExtendedTable function in HcatDelegator should also return extra fields from query desc extended tablename which contains fields sd, retention, parameters parametersSize and tableType. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6043) Document incompatible changes in Hive 0.12 and trunk
[ https://issues.apache.org/jira/browse/HIVE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-6043: --- Description: We need to document incompatible changes. For example * HIVE-5372 changed object inspector hierarchy breaking most if not all custom serdes * HIVE-1511/HIVE-5263 serializes ObjectInspectors with Kryo so all custom serdes (fixed by HIVE-5380) * Hive 0.12 separates MapredWork into MapWork and ReduceWork which is used by Serdes * HIVE-5411 serializes expressions with Kryo which are used by custom serdes * HIVE-4827 removed the flag of hive.optimize.mapjoin.mapreduce (This flag was introduced in Hive 0.11 by HIVE-3952). was: We need to document incompatible changes. For example * HIVE-5372 changed object inspector hierarchy breaking most if not all custom serdes * HIVE-1511/HIVE-5263 serializes ObjectInspectors with Kryo so all custom serdes (fixed by HIVE-5380) * Hive 0.12 separates MapredWork into MapWork and ReduceWork which is used by Serdes * HIVE-5411 serializes expressions with Kryo which are used by custom serdes Document incompatible changes in Hive 0.12 and trunk Key: HIVE-6043 URL: https://issues.apache.org/jira/browse/HIVE-6043 Project: Hive Issue Type: Task Reporter: Brock Noland Priority: Blocker We need to document incompatible changes. For example * HIVE-5372 changed object inspector hierarchy breaking most if not all custom serdes * HIVE-1511/HIVE-5263 serializes ObjectInspectors with Kryo so all custom serdes (fixed by HIVE-5380) * Hive 0.12 separates MapredWork into MapWork and ReduceWork which is used by Serdes * HIVE-5411 serializes expressions with Kryo which are used by custom serdes * HIVE-4827 removed the flag of hive.optimize.mapjoin.mapreduce (This flag was introduced in Hive 0.11 by HIVE-3952). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6043) Document incompatible changes in Hive 0.12 and trunk
[ https://issues.apache.org/jira/browse/HIVE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850759#comment-13850759 ] Yin Huai commented on HIVE-6043: I added HIVE-4827, which removed the flag of hive.optimize.mapjoin.mapreduce. Document incompatible changes in Hive 0.12 and trunk Key: HIVE-6043 URL: https://issues.apache.org/jira/browse/HIVE-6043 Project: Hive Issue Type: Task Reporter: Brock Noland Priority: Blocker We need to document incompatible changes. For example * HIVE-5372 changed object inspector hierarchy breaking most if not all custom serdes * HIVE-1511/HIVE-5263 serializes ObjectInspectors with Kryo so all custom serdes (fixed by HIVE-5380) * Hive 0.12 separates MapredWork into MapWork and ReduceWork which is used by Serdes * HIVE-5411 serializes expressions with Kryo which are used by custom serdes * HIVE-4827 removed the flag of hive.optimize.mapjoin.mapreduce (This flag was introduced in Hive 0.11 by HIVE-3952). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6044) webhcat should be able to return detailed serde information when show table using fromat=extended
[ https://issues.apache.org/jira/browse/HIVE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-6044: - Attachment: HIVE-6044.1.patch webhcat should be able to return detailed serde information when show table using fromat=extended --- Key: HIVE-6044 URL: https://issues.apache.org/jira/browse/HIVE-6044 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-6044.1.patch Now in webhcat, when using GET ddl/database/:db/table/:table and format=extended, return value is based on query show table extended like. However, this query doesn't contains serde info like line.delim and filed.delim. In this case, user won't have enough information to reconstruct the exact same table based on the information from the json file. The descExtendedTable function in HcatDelegator should also return extra fields from query desc extended tablename which contains fields sd, retention, parameters parametersSize and tableType. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Review Request 16330: HIVE-6045- Beeline hivevars is broken for more than one hivevar
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16330/ --- Review request for hive. Bugs: HIVE-6045 https://issues.apache.org/jira/browse/HIVE-6045 Repository: hive-git Description --- The implementation appends hivevars to the jdbc url in the form var1=val1var2=val2$var3-val3 but the regex used to parse this is expecting the delimiter to be ;. Changed the regex to fit the hivevar format. Diffs - jdbc/src/java/org/apache/hive/jdbc/Utils.java 913dc46 Diff: https://reviews.apache.org/r/16330/diff/ Testing --- Looks like TestBeelineWithArgs is no longer being run, and there are a lot of failures there due to other changes even without this change. Probably we need to move that test, and see if we can add a unit test there for this case. Thanks, Szehon Ho
[jira] [Created] (HIVE-6045) Beeline hivevars is broken for more than one hivevar
Szehon Ho created HIVE-6045: --- Summary: Beeline hivevars is broken for more than one hivevar Key: HIVE-6045 URL: https://issues.apache.org/jira/browse/HIVE-6045 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho HIVE-4568 introduced --hivevar flag. But if you specify more than one hivevar, for example {code} beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2 {code} then the variables during runtime get mangled to evaluate to: {code} file1=/user/szehon/file1file2=/user/szehon/file2 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6048) Hive load data command rejects file with '+' in the name
Xuefu Zhang created HIVE-6048: - Summary: Hive load data command rejects file with '+' in the name Key: HIVE-6048 URL: https://issues.apache.org/jira/browse/HIVE-6048 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang '+' is a valid character in a file name on linux and HDFS. However, loading data from such a file into table results the following error: {code} hive load data local inpath './t+est' into table test; FAILED: SemanticException Line 1:23 Invalid path ''./t+est'': No files matching path file:/home/xzhang/apache/hive7/t%20est {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6045) Beeline hivevars is broken for more than one hivevar
[ https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6045: Attachment: HIVE-6045.patch Attaching a fix. Beeline hivevars is broken for more than one hivevar Key: HIVE-6045 URL: https://issues.apache.org/jira/browse/HIVE-6045 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6045.patch HIVE-4568 introduced --hivevar flag. But if you specify more than one hivevar, for example {code} beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2 {code} then the variables during runtime get mangled to evaluate to: {code} file1=/user/szehon/file1file2=/user/szehon/file2 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6045) Beeline hivevars is broken for more than one hivevar
[ https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6045: Status: Patch Available (was: Open) Beeline hivevars is broken for more than one hivevar Key: HIVE-6045 URL: https://issues.apache.org/jira/browse/HIVE-6045 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6045.patch HIVE-4568 introduced --hivevar flag. But if you specify more than one hivevar, for example {code} beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2 {code} then the variables during runtime get mangled to evaluate to: {code} file1=/user/szehon/file1file2=/user/szehon/file2 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6035) Windows: percentComplete returned by job status from WebHCat is null
[ https://issues.apache.org/jira/browse/HIVE-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated HIVE-6035: -- Assignee: shanyu zhao Status: Patch Available (was: Open) Windows: percentComplete returned by job status from WebHCat is null Key: HIVE-6035 URL: https://issues.apache.org/jira/browse/HIVE-6035 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 0.13.0 Attachments: HIVE-6035.patch HIVE-5511 fixed the same problem on Linux, but it still broke on Windows. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/#review30574 --- ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFLTrim.java https://reviews.apache.org/r/15654/#comment58540 For these new tests please change the package to org.apache.hive.ql.udf.generic and move them to the directory src/test/org/apache/hadoop/hive/ql/udf/generic. - Carl Steinbach On Dec. 17, 2013, midnight, Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/ --- (Updated Dec. 17, 2013, midnight) Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra Pandey. Bugs: HIVE-5829 https://issues.apache.org/jira/browse/HIVE-5829 Repository: hive-git Description --- Rewrite the UDFS *pads and *trim using GenericUDF. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java bca1f26 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java eff251f ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFLTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFLpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFRTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFRpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFTrim.java PRE-CREATION Diff: https://reviews.apache.org/r/15654/diff/ Testing --- Thanks, Mohammad Islam
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-5829: - Status: Open (was: Patch Available) [~kamrul] I noted one small issue on RB related to the package names of the new tests. Other than that I think the patch is ready to commit. Rewrite Trim and Pad UDFs based on GenericUDF - Key: HIVE-5829 URL: https://issues.apache.org/jira/browse/HIVE-5829 Project: Hive Issue Type: Bug Components: UDF Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch This JIRA includes following UDFs: 1. trim() 2. ltrim() 3. rtrim() 4. lpad() 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-5829: - Component/s: UDF Rewrite Trim and Pad UDFs based on GenericUDF - Key: HIVE-5829 URL: https://issues.apache.org/jira/browse/HIVE-5829 Project: Hive Issue Type: Bug Components: UDF Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch This JIRA includes following UDFs: 1. trim() 2. ltrim() 3. rtrim() 4. lpad() 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6029) Add default authorization on database/table creation
[ https://issues.apache.org/jira/browse/HIVE-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850977#comment-13850977 ] Chris Drome commented on HIVE-6029: --- [~brocknoland] the initial patch was only intended for informational purposes as requested by [~thejas]. There is much more clean-up to be done, so please do not consider this yet. I will try to look at your rebased patch in the next couple of days. Thanks for reviewing. Add default authorization on database/table creation Key: HIVE-6029 URL: https://issues.apache.org/jira/browse/HIVE-6029 Project: Hive Issue Type: Improvement Components: Authorization, Metastore Affects Versions: 0.10.0 Reporter: Chris Drome Assignee: Chris Drome Priority: Minor Attachments: HIVE-6029-1.patch.txt, HIVE-6029.2.patch Default authorization privileges are not set when a database/table is created. This allows a user to create a database/table and not be able to access it through Sentry. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6049) Hive uses deprecated hadoop configuration in Hadoop 2.0
shanyu zhao created HIVE-6049: - Summary: Hive uses deprecated hadoop configuration in Hadoop 2.0 Key: HIVE-6049 URL: https://issues.apache.org/jira/browse/HIVE-6049 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.12.0 Reporter: shanyu zhao Running hive CLI on hadoop 2.0, you'll see deprecated configurations warnings like this: 13/12/14 01:00:51 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 13/12/14 01:00:52 INFO Configuration.deprecation: mapred.max.split.size is depre cated. Instead, use mapreduce.input.fileinputformat.split.maxsize 13/12/14 01:00:52 INFO Configuration.deprecation: mapred.min.split.size is depre cated. Instead, use mapreduce.input.fileinputformat.split.minsize 13/12/14 01:00:52 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.r ack 13/12/14 01:00:52 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.n ode 13/12/14 01:00:52 INFO Configuration.deprecation: mapred.reduce.tasks is depreca ted. Instead, use mapreduce.job.reduces 13/12/14 01:00:52 INFO Configuration.deprecation: mapred.reduce.tasks.speculativ e.execution is deprecated. Instead, use mapreduce.reduce.speculative -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-hadoop2 - Build # 608 - Still Failing
Changes for Build #572 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #573 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #574 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #575 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #576 Changes for Build #577 Changes for Build #578 Changes for Build #579 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #580 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #581 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #582 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #583 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #584 [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #585 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) Changes for Build #586 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #587 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #588 Changes for Build #589 Changes for Build #590 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #591 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types (reviewed by Sergey Shelukhin) [hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath Pandey via Ashutosh Chauhan) [hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery contains nulls in matching column (Harish Butani via Ashutosh Chauhan) [hashutosh] HIVE-5598 :
[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.
[ https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850984#comment-13850984 ] Pala M Muthaia commented on HIVE-6028: -- Sergey, the same thing above works in hive 12, for a regular string column (as opposed to partition column). In any case, given the cost of fix vs severity, we will avoid depending on type coercion and use proper literals. Partition predicate literals are not interpreted correctly. --- Key: HIVE-6028 URL: https://issues.apache.org/jira/browse/HIVE-6028 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Pala M Muthaia Attachments: Hive-6028-explain-plan.txt When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 2100 explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. See attached explain plan file. Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850993#comment-13850993 ] Hive QA commented on HIVE-3454: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12588389/HIVE-3454.patch {color:green}SUCCESS:{color} +1 4789 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/671/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/671/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12588389 Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0 Reporter: Ryan Harris Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6050) JDBC backward compatibility is broken
Szehon Ho created HIVE-6050: --- Summary: JDBC backward compatibility is broken Key: HIVE-6050 URL: https://issues.apache.org/jira/browse/HIVE-6050 Project: Hive Issue Type: Bug Components: JDBC Reporter: Szehon Ho Connect from JDBC driver of Hive 0.12 (TProtocolVersion=v4) to HiveServer2 of Hive 0.10 (TProtocolVersion=v1), will return the following exception: {noformat} java.sql.SQLException: Could not establish connection to jdbc:hive2://hive-c5-mysql-1.ent.cloudera.com:1/default: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:336) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:158) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:187) at com.cloudera.itest.hiveserver.UnmanagedHiveServer.createConnection(UnmanagedHiveServer.java:73) at com.cloudera.itest.AbstractTestWithStaticConfiguration.createConnection(AbstractTestWithStaticConfiguration.java:68) at com.cloudera.itest.FirstTest.sanityConnectionTest(FirstTest.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:292) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:77) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:160) at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:147) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:327) ... 37 more {noformat} On code analysis, it looks like the 'client_protocol' scheme is a ThriftEnum, which doesn't seem to be backward-compatible. Look at the generated file 'TOpenSessionReq.java', the method TOpenSessionReqStandardScheme.read(). The method will call 'TProtocolVersion.findValue() on the thrift protocol's bytes, which returns null if the client is sending an enum value unknown to the server. Then struct.validate() at the end of the method will fail because protocol version is null. So doesn't look like the current backward-compatibility scheme will work. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6050) JDBC backward compatibility is broken
[ https://issues.apache.org/jira/browse/HIVE-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6050: Description: Connect from JDBC driver of Hive 0.12 (TProtocolVersion=v4) to HiveServer2 of Hive 0.10 (TProtocolVersion=v1), will return the following exception: {noformat} java.sql.SQLException: Could not establish connection to jdbc:hive2://localhost:1/default: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:336) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:158) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:187) at com.cloudera.itest.hiveserver.UnmanagedHiveServer.createConnection(UnmanagedHiveServer.java:73) at com.cloudera.itest.AbstractTestWithStaticConfiguration.createConnection(AbstractTestWithStaticConfiguration.java:68) at com.cloudera.itest.FirstTest.sanityConnectionTest(FirstTest.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:292) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:77) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:160) at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:147) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:327) ... 37 more {noformat} On code analysis, it looks like the 'client_protocol' scheme is a ThriftEnum, which doesn't seem to be backward-compatible. Look at the code path in the generated file 'TOpenSessionReq.java', method TOpenSessionReqStandardScheme.read(): 1. The method will call 'TProtocolVersion.findValue()' on the thrift protocol's byte stream, which returns null if the client is sending an enum value unknown to the server. (v4 is unknown to server) 2. The method will then call struct.validate(), which will throw the above exception because of null version. So doesn't look like the current backward-compatibility scheme will work. was: Connect from JDBC driver of Hive 0.12 (TProtocolVersion=v4) to HiveServer2 of Hive 0.10 (TProtocolVersion=v1), will return the following exception: {noformat} java.sql.SQLException:
[jira] [Assigned] (HIVE-6028) Partition predicate literals are not interpreted correctly.
[ https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-6028: -- Assignee: Sergey Shelukhin Partition predicate literals are not interpreted correctly. --- Key: HIVE-6028 URL: https://issues.apache.org/jira/browse/HIVE-6028 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Pala M Muthaia Assignee: Sergey Shelukhin Attachments: Hive-6028-explain-plan.txt When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 2100 explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. See attached explain plan file. Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.
[ https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851006#comment-13851006 ] Sergey Shelukhin commented on HIVE-6028: Yeah, I agree that this is breakage in 12 compared to 11. Sorry for that. Good to know that the workaround works. I will resolve as dup of 4914, as the fix is contained therein. Partition predicate literals are not interpreted correctly. --- Key: HIVE-6028 URL: https://issues.apache.org/jira/browse/HIVE-6028 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Pala M Muthaia Attachments: Hive-6028-explain-plan.txt When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 2100 explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. See attached explain plan file. Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task
[ https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851007#comment-13851007 ] Yin Huai commented on HIVE-5891: [~sunrui] Sorry for getting back late. I just took a look at QB. Seems it uses aliasToSubq to store the mapping from aliases to sub query expressions (QBExpr). Then, a QBExpr also stores a QB which represents the subquery QB. With this recursive way, all QBs for different levels of the query are stored. So, parseCtx.getQB() only gets the main query block and its id is null. I am not sure if we can get the right QB (the QB for a subquery) from GenMapRedUtils.splitTasks... Can you take a quick look to see if it is easy to get the correct QB? If so, we can use the id of a QB to replace INTNAME. If not, let's use joinTree.getId for those JoinOperators. Seems we do not need to take special care to DemuxOperator. Can you create a review request for your patch? I can leave comments on the review board. Also, since QBJoinTree.getJoinStreamDesc is not used, let's delete it. Alias conflict when merging multiple mapjoin tasks into their common child mapred task -- Key: HIVE-5891 URL: https://issues.apache.org/jira/browse/HIVE-5891 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sun Rui Assignee: Sun Rui Attachments: HIVE-5891.1.patch Use the following test case with HIVE 0.12: {quote} create table src(key int, value string); load data local inpath 'src/data/files/kv1.txt' overwrite into table src; select * from ( select c.key from (select a.key from src a join src b on a.key=b.key group by a.key) tmp join src c on tmp.key=c.key union all select c.key from (select a.key from src a join src b on a.key=b.key group by a.key) tmp join src c on tmp.key=c.key ) x; {quote} We will get a NullPointerException from Union Operator: {quote} java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:0} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {_col0:0} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) ... 4 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) ... 5 more {quote} The root cause is in CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask(). +--+ +--+ | MapJoin task | | MapJoin task | +--+ +--+ \ / \ / +--+ | Union task | +--+ CommonJoinTaskDispatcher merges the two MapJoin tasks into their common child: Union task. The two MapJoin tasks have the same alias name for their big
[jira] [Resolved] (HIVE-6028) Partition predicate literals are not interpreted correctly.
[ https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-6028. Resolution: Duplicate Fix Version/s: 0.13.0 Partition predicate literals are not interpreted correctly. --- Key: HIVE-6028 URL: https://issues.apache.org/jira/browse/HIVE-6028 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Pala M Muthaia Assignee: Sergey Shelukhin Fix For: 0.13.0 Attachments: Hive-6028-explain-plan.txt When parsing/analyzing query, hive treats partition predicate value as int instead of string. This breaks down and leads to incorrect result when the partition predicate value starts with int 0, e.g: hour=00, hour=05 etc. The following repro illustrates the bug: -- create test table and partition, populate with some data create table test_partition_pred(col1 int) partitioned by (hour STRING); insert into table test_partition_pred partition (hour=00) select 21 FROM some_table limit 1; -- this query returns incorrect results, i.e. just empty set. select * from test_partition_pred where hour=00; OK -- this query returns correct result. Note predicate value is string literal select * from test_partition_pred where hour='00'; OK 2100 explain plan illustrates how the query was interpreted. Particularly the partition predicate is pushed down as regular filter clause, with hour=0 as predicate. See attached explain plan file. Note: 1. The type of the partition column is defined as string, not int. 2. This is a regression in Hive 0.12. This used to work in Hive 0.11 3. Not an issue when the partition value starts with integer other than 0, e.g hour=10, hour=11 etc. 4. As seen above, workaround is to use string literal hour='00' etc. This should not be too bad if in the failing case hive complains that partition hour=0 is not found, or complains literal type doesn't match column type. Instead hive silently pushes it down as filter clause, and query succeeds with empty set as result. We found this out in our production tables partitioned by hour, only a few days after it started occurring, when there were empty data sets for partitions hour=00 to hour=09. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6050) JDBC backward compatibility is broken
[ https://issues.apache.org/jira/browse/HIVE-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851013#comment-13851013 ] Szehon Ho commented on HIVE-6050: - [~ashutoshc] [~cwsteinbach] Do you guys have any thoughts/experiences on this issue? It seems like we would need to change client protocol version to use another data type, to get this to work. My thought was this should be ok, as backward-compatibility seem to be broken today anyway based on this analysis. JDBC backward compatibility is broken - Key: HIVE-6050 URL: https://issues.apache.org/jira/browse/HIVE-6050 Project: Hive Issue Type: Bug Components: JDBC Reporter: Szehon Ho Connect from JDBC driver of Hive 0.12 (TProtocolVersion=v4) to HiveServer2 of Hive 0.10 (TProtocolVersion=v1), will return the following exception: {noformat} java.sql.SQLException: Could not establish connection to jdbc:hive2://localhost:1/default: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:336) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:158) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:187) at com.cloudera.itest.hiveserver.UnmanagedHiveServer.createConnection(UnmanagedHiveServer.java:73) at com.cloudera.itest.AbstractTestWithStaticConfiguration.createConnection(AbstractTestWithStaticConfiguration.java:68) at com.cloudera.itest.FirstTest.sanityConnectionTest(FirstTest.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:69) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:48) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:292) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:77) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:160) at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:147) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:327) ... 37 more {noformat} On code analysis, it looks like the 'client_protocol' scheme is a ThriftEnum, which doesn't seem to be backward-compatible. Look at the code path in the generated file
[jira] [Commented] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851016#comment-13851016 ] Eric Hanson commented on HIVE-5829: --- Looks good to me from the point of view of vectorization -- trim/ltrim/rtrim still vectorize. Rewrite Trim and Pad UDFs based on GenericUDF - Key: HIVE-5829 URL: https://issues.apache.org/jira/browse/HIVE-5829 Project: Hive Issue Type: Bug Components: UDF Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch This JIRA includes following UDFs: 1. trim() 2. ltrim() 3. rtrim() 4. lpad() 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16229: HIVE-6010 create a test that would ensure vectorization produces same results as non-vectorized execution
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16229/ --- (Updated Dec. 17, 2013, 10:40 p.m.) Review request for hive and Jitendra Pandey. Bugs: HIVE-6010 https://issues.apache.org/jira/browse/HIVE-6010 Repository: hive-git Description --- See jira. Diffs (updated) - ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java 79840c9 itests/qtest/pom.xml 971c5d3 itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 275e3d7 ql/src/test/queries/clientcompare/vectorized_math_funcs.q PRE-CREATION ql/src/test/queries/clientcompare/vectorized_math_funcs_00.qv PRE-CREATION ql/src/test/queries/clientcompare/vectorized_math_funcs_01.qv PRE-CREATION ql/src/test/templates/TestCompareCliDriver.vm PRE-CREATION Diff: https://reviews.apache.org/r/16229/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Updated] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6010: --- Attachment: HIVE-6010.03.patch Now that logarithms are fixed I can add them to the test. Trivial update, should not affect +1 as long as the test passes :) create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6010.01.patch, HIVE-6010.02.patch, HIVE-6010.03.patch, HIVE-6010.patch So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6044) webhcat should be able to return detailed serde information when show table using fromat=extended
[ https://issues.apache.org/jira/browse/HIVE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-6044: - Attachment: HIVE-6044.1.patch webhcat should be able to return detailed serde information when show table using fromat=extended --- Key: HIVE-6044 URL: https://issues.apache.org/jira/browse/HIVE-6044 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-6044.1.patch Now in webhcat, when using GET ddl/database/:db/table/:table and format=extended, return value is based on query show table extended like. However, this query doesn't contains serde info like line.delim and filed.delim. In this case, user won't have enough information to reconstruct the exact same table based on the information from the json file. The descExtendedTable function in HcatDelegator should also return extra fields from query desc extended tablename which contains fields sd, retention, parameters parametersSize and tableType. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6044) webhcat should be able to return detailed serde information when show table using fromat=extended
[ https://issues.apache.org/jira/browse/HIVE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-6044: - Attachment: (was: HIVE-6044.1.patch) webhcat should be able to return detailed serde information when show table using fromat=extended --- Key: HIVE-6044 URL: https://issues.apache.org/jira/browse/HIVE-6044 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-6044.1.patch Now in webhcat, when using GET ddl/database/:db/table/:table and format=extended, return value is based on query show table extended like. However, this query doesn't contains serde info like line.delim and filed.delim. In this case, user won't have enough information to reconstruct the exact same table based on the information from the json file. The descExtendedTable function in HcatDelegator should also return extra fields from query desc extended tablename which contains fields sd, retention, parameters parametersSize and tableType. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6045) Beeline hivevars is broken for more than one hivevar
[ https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851066#comment-13851066 ] Hive QA commented on HIVE-6045: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619170/HIVE-6045.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4789 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestJdbcDriver2.testNewConnectionConfiguration {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/672/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/672/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619170 Beeline hivevars is broken for more than one hivevar Key: HIVE-6045 URL: https://issues.apache.org/jira/browse/HIVE-6045 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6045.patch HIVE-4568 introduced --hivevar flag. But if you specify more than one hivevar, for example {code} beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2 {code} then the variables during runtime get mangled to evaluate to: {code} file1=/user/szehon/file1file2=/user/szehon/file2 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6035) Windows: percentComplete returned by job status from WebHCat is null
[ https://issues.apache.org/jira/browse/HIVE-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851067#comment-13851067 ] Hive QA commented on HIVE-6035: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618722/HIVE-6035.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/673/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/673/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-673/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'jdbc/src/java/org/apache/hive/jdbc/Utils.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/assembly/target shims/0.20S/target shims/0.23/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1551750. At revision 1551750. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12618722 Windows: percentComplete returned by job status from WebHCat is null Key: HIVE-6035 URL: https://issues.apache.org/jira/browse/HIVE-6035 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 0.13.0 Attachments: HIVE-6035.patch HIVE-5511 fixed the same problem on Linux, but it still broke on Windows. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851073#comment-13851073 ] Hive QA commented on HIVE-6010: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619183/HIVE-6010.03.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/674/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/674/console Messages: {noformat} This message was trimmed, see log for full details [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/itests (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-it --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-it --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/target/tmp/conf [copy] Copying 4 files to /data/hive-ptest/working/apache-svn-trunk-source/itests/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-it --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/itests/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-it/0.13.0-SNAPSHOT/hive-it-0.13.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Integration - Custom Serde 0.13.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-it-custom-serde --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ hive-it-custom-serde --- [debug] execute contextualize [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/main/resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-it-custom-serde --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-it-custom-serde --- [INFO] Compiling 8 source files to /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/classes [INFO] [INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ hive-it-custom-serde --- [debug] execute contextualize [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/src/test/resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-it-custom-serde --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp/conf [copy] Copying 4 files to /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-it-custom-serde --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-it-custom-serde --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-it-custom-serde --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/hive-it-custom-serde-0.13.0-SNAPSHOT.jar [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-it-custom-serde --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/target/hive-it-custom-serde-0.13.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-it-custom-serde/0.13.0-SNAPSHOT/hive-it-custom-serde-0.13.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/itests/custom-serde/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-it-custom-serde/0.13.0-SNAPSHOT/hive-it-custom-serde-0.13.0-SNAPSHOT.pom [INFO]
[jira] [Commented] (HIVE-5837) SQL standard based secure authorization for hive
[ https://issues.apache.org/jira/browse/HIVE-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851079#comment-13851079 ] Brock Noland commented on HIVE-5837: bq. Should we make one of the sql standard privileges available on SERVER object? Privileges on the SERVER object can make sense but I feel the more important aspect is to ensure privileges are scoped to a SERVER for the reason I will outline below. bq. Brock, could you give more details on the SERVER use case? I've seen people use multiple instances of HS2 for HA/scaling, but never allocating some users to some instances and others to others. What's the motivation for that? It's a very similar use case to federation. Enterprises often want to isolate groups of users from using the same resource. The scenario is you have group A and group B and they cannot or do not want to share the same HS2. By having server in the hierarchy you can enforce the separation amongst HS2 instances. SQL standard based secure authorization for hive Key: HIVE-5837 URL: https://issues.apache.org/jira/browse/HIVE-5837 Project: Hive Issue Type: New Feature Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: SQL standard authorization hive.pdf The current default authorization is incomplete and not secure. The alternative of storage based authorization provides security but does not provide fine grained authorization. The proposal is to support secure fine grained authorization in hive using SQL standard based authorization model. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6051) Create DecimalColumnVector and a representative VectorExpression for decimal
Eric Hanson created HIVE-6051: - Summary: Create DecimalColumnVector and a representative VectorExpression for decimal Key: HIVE-6051 URL: https://issues.apache.org/jira/browse/HIVE-6051 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Create a DecimalColumnVector to use as a basis for vectorized decimal operations. Include a representative VectorExpression on decimal (e.g. column-column addition) to demonstrate it's use. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6010: --- Attachment: HIVE-6010.04.patch import was removed by some other patch, rebase create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6010.01.patch, HIVE-6010.02.patch, HIVE-6010.03.patch, HIVE-6010.04.patch, HIVE-6010.patch So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6051) Create DecimalColumnVector and a representative VectorExpression for decimal
[ https://issues.apache.org/jira/browse/HIVE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6051: -- Attachment: HIVE-6051.01.patch Create DecimalColumnVector and a representative VectorExpression for decimal Key: HIVE-6051 URL: https://issues.apache.org/jira/browse/HIVE-6051 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-6051.01.patch Create a DecimalColumnVector to use as a basis for vectorized decimal operations. Include a representative VectorExpression on decimal (e.g. column-column addition) to demonstrate it's use. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
[ https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851095#comment-13851095 ] Gunther Hagleitner commented on HIVE-5065: -- part 2 add some .q file tests. This is necessary to round out the n .q file tests (some integration testing is necessary). Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask Key: HIVE-5065 URL: https://issues.apache.org/jira/browse/HIVE-5065 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Blocker Fix For: tez-branch Attachments: HIVE-5065-part-1.1.patch, HIVE-5065-part2.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
[ https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5065: - Attachment: HIVE-5065-part2.1.patch Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask Key: HIVE-5065 URL: https://issues.apache.org/jira/browse/HIVE-5065 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Blocker Fix For: tez-branch Attachments: HIVE-5065-part-1.1.patch, HIVE-5065-part2.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5762) Implement vectorized support for the DECIMAL data type
[ https://issues.apache.org/jira/browse/HIVE-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851096#comment-13851096 ] Eric Hanson commented on HIVE-5762: --- See HIVE-6051 for column vector code based on Decimal128. Implement vectorized support for the DECIMAL data type -- Key: HIVE-5762 URL: https://issues.apache.org/jira/browse/HIVE-5762 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Add support to allow queries referencing DECIMAL columns and expression results to run efficiently in vectorized mode. Include unit tests and end-to-end tests. Before starting or at least going very far, please write design specification (a new section for the design spec attached to HIVE-4160) for how support for the different DECIMAL types should work in vectorized mode, and the roadmap, and have it reviewed. It may be feasible to re-use LongColumnVector and related VectorExpression classes for fixed-point decimal in certain data ranges. That should be at least considered to get faster performance and save code. For unlimited precision DECIMAL, a new column vector subtype may be needed, or a BytesColumnVector could be re-used. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5911) Recent change to schema upgrade scripts breaks file naming conventions
[ https://issues.apache.org/jira/browse/HIVE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851116#comment-13851116 ] Sergey Shelukhin commented on HIVE-5911: ping? :) Recent change to schema upgrade scripts breaks file naming conventions -- Key: HIVE-5911 URL: https://issues.apache.org/jira/browse/HIVE-5911 Project: Hive Issue Type: Bug Components: Metastore Reporter: Carl Steinbach Assignee: Sergey Shelukhin Attachments: HIVE-5911.01.patch, HIVE-5911.patch The changes made in HIVE-5700 break the convention for naming schema upgrade scripts. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: adding ANSI flag for hive
Agree on both points. For now, what I had in mind was double vs decimal, and other such backward compat vs SQL compat and potentially perf vs SQL compact cases. I think one flag would not be so bad... On Mon, Dec 16, 2013 at 8:29 AM, Alan Gates ga...@hortonworks.com wrote: A couple of thoughts on this: 1) If we did this I think we should have one flag, not many. As Thejas points out, your test matrix goes insane when you have too many flags and hence things don't get properly tested. 2) We could do this in an incremental way, where we create this new ANSI flag and are clear with users that for a while this will be evolving. That is, as we find new issues with data types, semantics, whatever, we will continue to change the behavior of this flag. At some point in the future (as Thejas suggests, at a 1.0 release) we could make this the default behavior. This avoids having to do a full sweep now and find everything that we want to change and make ANSI compliant and living with whatever we miss. Alan. On Dec 11, 2013, at 5:14 PM, Thejas Nair wrote: Having too many configs complicates things for the user, and also complicates the code, and you also end up having many untested combinations of config flags. I think we should identify a bunch of non compatible changes that we think are important, fix it in a branch and make a major version release (say 1.x). This is also related to HIVE-5875, where there is a discussion on switching the defaults for some of the configs to more desirable values, but non backward compatible values. On Wed, Dec 11, 2013 at 4:33 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Hi. There's recently been some discussion about data type changes in Hive (double to decimal), and result changes for special cases like division by zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an example; I am assuming ANSI SQL is meant). The latter are non-controversial (I guess), but for the former, performance may suffer and/or backward compat may be broken if Hive is brought in compliance. If fuller ANSI compat is sought in the future, there may be some even hairier issues such as double-quoted identifiers. In light of that, and also following MySQL, I wonder if we should add a flag, or set of flags, to HIVE to be able to force ANSI compliance. When this/ese flag/s is/are not set, for example, int/int division could return double for backward compat/perf, vectorization can skip the special case handling for division by zero/etc., etc. Wdyt? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type
[ https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851133#comment-13851133 ] Eric Hanson commented on HIVE-5761: --- Hi Teddy, Are you going to work on this anytime soon? Please let me know one way or the other. Thanks! Eric Implement vectorized support for the DATE data type --- Key: HIVE-5761 URL: https://issues.apache.org/jira/browse/HIVE-5761 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Add support to allow queries referencing DATE columns and expression results to run efficiently in vectorized mode. This should re-use the code for the the integer/timestamp types to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized integer and/or timestamp operations. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6044) webhcat should be able to return detailed serde information when show table using fromat=extended
[ https://issues.apache.org/jira/browse/HIVE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851137#comment-13851137 ] Hive QA commented on HIVE-6044: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619185/HIVE-6044.1.patch {color:green}SUCCESS:{color} +1 4789 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/675/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/675/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12619185 webhcat should be able to return detailed serde information when show table using fromat=extended --- Key: HIVE-6044 URL: https://issues.apache.org/jira/browse/HIVE-6044 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-6044.1.patch Now in webhcat, when using GET ddl/database/:db/table/:table and format=extended, return value is based on query show table extended like. However, this query doesn't contains serde info like line.delim and filed.delim. In this case, user won't have enough information to reconstruct the exact same table based on the information from the json file. The descExtendedTable function in HcatDelegator should also return extra fields from query desc extended tablename which contains fields sd, retention, parameters parametersSize and tableType. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16330: HIVE-6045- Beeline hivevars is broken for more than one hivevar
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16330/ --- (Updated Dec. 18, 2013, 12:39 a.m.) Review request for hive. Changes --- Today the hive-var is broken because of the following: 1. Beeline constructs jdbc url fragment of hive-var with '' delimiter 2. JDBC uses ';' delimiter to parse this hive-var fragment. Original patch had changed JDBC parsing regex (part 2) to expect ''. But some test cases were manually constructing JDBC url's with ';' as delimiter and checking the parsing works on that. One option is the change the test, which should be fine for JDBC URL backward-compatibility as this is new feature for 0.13. But still I decided to minimize the impact and choose an alternate fix (part 1), so that beeline constructs URL fragment with ';' delimiter. Bugs: HIVE-6045 https://issues.apache.org/jira/browse/HIVE-6045 Repository: hive-git Description --- The implementation appends hivevars to the jdbc url in the form var1=val1var2=val2$var3-val3 but the regex used to parse this is expecting the delimiter to be ;. Changed the regex to fit the hivevar format. Diffs (updated) - jdbc/src/java/org/apache/hive/jdbc/Utils.java 913dc46 Diff: https://reviews.apache.org/r/16330/diff/ Testing --- Looks like TestBeelineWithArgs is no longer being run, and there are a lot of failures there due to other changes even without this change. Probably we need to move that test, and see if we can add a unit test there for this case. Thanks, Szehon Ho
Re: Review Request 16330: HIVE-6045- Beeline hivevars is broken for more than one hivevar
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16330/ --- (Updated Dec. 18, 2013, 12:41 a.m.) Review request for hive. Changes --- Today the hive-var is broken because of the following: 1. Beeline constructs jdbc url fragment of hive-var with '' delimiter 2. JDBC uses ';' delimiter to parse this hive-var fragment. Original patch had changed JDBC parsing regex (part 2) to expect ''. But some test cases were manually constructing JDBC url's with ';' as delimiter and checking the parsing works on that. One option is the change the test, which should be fine for JDBC URL backward-compatibility as this is new feature for 0.13. But still I decided to minimize the impact and choose an alternate fix (part 1), so that beeline constructs URL fragment with ';' delimiter. Bugs: HIVE-6045 https://issues.apache.org/jira/browse/HIVE-6045 Repository: hive-git Description --- The implementation appends hivevars to the jdbc url in the form var1=val1var2=val2$var3-val3 but the regex used to parse this is expecting the delimiter to be ;. Changed the regex to fit the hivevar format. Diffs (updated) - beeline/src/java/org/apache/hive/beeline/DatabaseConnection.java 1de5829 Diff: https://reviews.apache.org/r/16330/diff/ Testing --- Looks like TestBeelineWithArgs is no longer being run, and there are a lot of failures there due to other changes even without this change. Probably we need to move that test, and see if we can add a unit test there for this case. Thanks, Szehon Ho
[jira] [Updated] (HIVE-6045) Beeline hivevars is broken for more than one hivevar
[ https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6045: Attachment: HIVE-6045.1.patch Today the hive-var is broken because of the following: 1. Beeline constructs jdbc url fragment of hive-var with '' delimiter 2. JDBC uses ';' delimiter to parse this hive-var fragment. Original patch had changed JDBC parsing regex (part 2) to expect ''. But some test cases were manually constructing JDBC url's with ';' as delimiter and checking the parsing works on that. One option is the change the test, which should be fine for JDBC URL backward-compatibility as this is new feature for 0.13. But still I decided to minimize the impact and choose an alternate fix (part 1), so that beeline constructs URL fragment with ';' delimiter. Beeline hivevars is broken for more than one hivevar Key: HIVE-6045 URL: https://issues.apache.org/jira/browse/HIVE-6045 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6045.1.patch, HIVE-6045.patch HIVE-4568 introduced --hivevar flag. But if you specify more than one hivevar, for example {code} beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2 {code} then the variables during runtime get mangled to evaluate to: {code} file1=/user/szehon/file1file2=/user/szehon/file2 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16299: HIVE-6013: Supporting Quoted Identifiers in Column Names
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16299/#review30570 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/16299/#comment58532 class PatternValidator was recently introduced in HiveConf, which doesn't let user to specify invalid value for a config key. Using that here will be useful. metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java https://reviews.apache.org/r/16299/#comment58545 Shall we remove this if() altogether and thus also above newly introduced method? ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java https://reviews.apache.org/r/16299/#comment58546 conf should be null here. If it is null, then its a bug. Also, returning null in those cases seems incorrect. Lets remove this null conf check. ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java https://reviews.apache.org/r/16299/#comment58584 Since this method always return true, no need for this if block. ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g https://reviews.apache.org/r/16299/#comment58585 There can never be the case that hiveconf == null. That will be a bug. Lets remove this null check. ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g https://reviews.apache.org/r/16299/#comment58586 It will be good to document where all Identifier is used. Can be lifted straight from your html document. ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g https://reviews.apache.org/r/16299/#comment58587 Good to add a note here saying QuotedIdentifier only optionally available for columns as of now. ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g https://reviews.apache.org/r/16299/#comment58588 Not related for this patch, but if you feel like it, ll be good to add comment about where CharSetNames are used. Not necessary though. - Ashutosh Chauhan On Dec. 16, 2013, 10:22 p.m., Harish Butani wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16299/ --- (Updated Dec. 16, 2013, 10:22 p.m.) Review request for hive, Ashutosh Chauhan and Alan Gates. Bugs: HIVE-6013 https://issues.apache.org/jira/browse/HIVE-6013 Repository: hive-git Description --- Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: Introduce 'standard' quoted identifiers for columns only. At the langauage level this is turned on by a flag. At the metadata level we relax the constraint on column names. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 itests/qtest/pom.xml 8c249a0 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 3deed45 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java eb26e7f ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 17e6aad ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ed9917d ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java 1e6826f ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d18ea03 ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 8fe2262 ql/src/test/queries/clientnegative/invalid_columns.q f8be8c8 ql/src/test/queries/clientpositive/quotedid_alter.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_basic.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_partition.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_skew.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_smb.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_tblproperty.q PRE-CREATION ql/src/test/results/clientnegative/invalid_columns.q.out 3311b0a ql/src/test/results/clientpositive/quotedid_alter.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_basic.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_smb.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_tblproperty.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16299/diff/ Testing --- added new tests for create, alter, delete, query with columns containing special characters. Tests start with
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851198#comment-13851198 ] Ashutosh Chauhan commented on HIVE-6013: Approach looks ok to me. Some implementation level comments on RB. One test scenario. If this is already covered in your test, feel free to ignore. Otherwise, can you add following test. set hive.support.quoted.identifiers=column; create table t1 (aa int, ab string); select a.* from t1; -- this should select both columns. Also, you mentioned in html doc, some of jdbc api methods need to change, but I don't see any changes in jdbc package. Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6052) metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns
Sergey Shelukhin created HIVE-6052: -- Summary: metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns Key: HIVE-6052 URL: https://issues.apache.org/jira/browse/HIVE-6052 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin If integer partition columns have values stores in non-canonical form, for example with leading zeroes, the integer filter doesn't work. That is because JDO pushdown uses substrings to compare for equality, and SQL pushdown is intentionally crippled to do the same to produce same results. Probably, since both SQL pushdown and integers pushdown are just perf optimizations, we can remove it for JDO (or make configurable and disable by default), and uncripple SQL. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5230) Better error reporting by async threads in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851237#comment-13851237 ] Thejas M Nair commented on HIVE-5230: - Rebased patch looks good. I will commit it shortly (already been +1'd). Better error reporting by async threads in HiveServer2 -- Key: HIVE-5230 URL: https://issues.apache.org/jira/browse/HIVE-5230 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.12.0, 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-5230.1.patch, HIVE-5230.1.patch, HIVE-5230.10.patch, HIVE-5230.2.patch, HIVE-5230.3.patch, HIVE-5230.4.patch, HIVE-5230.6.patch, HIVE-5230.7.patch, HIVE-5230.8.patch, HIVE-5230.9.patch [HIVE-4617|https://issues.apache.org/jira/browse/HIVE-4617] provides support for async execution in HS2. When a background thread gets an error, currently the client can only poll for the operation state and also the error with its stacktrace is logged. However, it will be useful to provide a richer error response like thrift API does with TStatus (which is constructed while building a Thrift response object). -- This message was sent by Atlassian JIRA (v6.1.4#6159)