Re: [DISCUSS] ORC separate project
On 4/10/15, 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote: To Owen's explanation - Thanks. I guess my major concern is that we seemingly are breaking apart Hive's integrity and making it hard to release and maintain due to increasing number of external dependents. Let's say that Hive depends on a certain version of ORC (as TLP) and it's found that ORC has a bug that seriously impacts Hive users. We cannot release Hive as fast as we can, since dong so would need ORC community to fix the problem and make a release, for which Hive PMC has no control. On the contrary, Hive community can quickly fix the problem and make a release without waiting for other projects to make a release. I'm not sure this move (ORC as TLP) will be beneficial to vast Hive users. You need to understand exactly what this brings about for Hive, in fact to those who do not use ORC today. With the proposed changes, competing formats like Parquet might be able to compete with ORC in terms of hive features. That is the direct impact of standardization of a Storage-API implementation. As an independent project, new ORC features cannot use the fact that it is included in the ql/ source to introduce circular dependencies between ql.exec - orc - ql.exec.vector classes. As far as your concern for risks go, I would ask for a comparison against the bugs/release cycles of ³STORED AS PARQUET². As a Hive contributor, I¹m certain that if I find a core issue in Parquet, my patches would be welcome there. That should be beneficial to the Parquet community, but might not be aligned entirely along employer lines, since my patch might be good, but my intention would be to migrating warehouses with parquet.hive.DeprecatedParquetInputFormat Impala tables to Hive. Resolving that conflict should be ideally left to the Parquet IPMC the ASF rather than the Hive PMC (or let¹s do a bias check *to* Hive?). Now - reverse that argument and replay it, except instead we¹re talking about the C++ ORC reader plus a non-ASF SQL competitor to Hive. If this not convincing, let me propose that we spin off metastore also as TLP tomorrow! http://incubator.apache.org/projects/hcatalog.html Cheers, Gopal
Add 'Reuben Kuhnert' as contributor.
Hi, Reuben has been trying to contact the dev list to add him as a Hive contributor, but for some reason his email is lost somewhere, and the list is not receiving his email. Could someone add him as a contributor and give him permissions to assign bugs to himself? His ID is a funny one :P sircodesalot Thanks, - Sergio
[jira] [Created] (HIVE-10305) TestOrcFile has a mistake that makes metadata test ineffective
Owen O'Malley created HIVE-10305: Summary: TestOrcFile has a mistake that makes metadata test ineffective Key: HIVE-10305 URL: https://issues.apache.org/jira/browse/HIVE-10305 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Two of the values that are being stored as user metadata in TestOrcFile.metaData weren't flipped and thus were empty buffers. The test passes because they are compared to empty buffers. We should fix the test to perform the expected test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Add to Developer List
Hi, Can I be added to the Hive Developer List. My apache ID is 'sircodesalot'. Thank you
[jira] [Created] (HIVE-10306) We need to print tez summary when hive.server2.logging.level = PERFORMANCE.
Hari Sankar Sivarama Subramaniyan created HIVE-10306: Summary: We need to print tez summary when hive.server2.logging.level = PERFORMANCE. Key: HIVE-10306 URL: https://issues.apache.org/jira/browse/HIVE-10306 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan We need to print tez summary when hive.server2.logging.level = PERFORMANCE. We introduced this parameter via HIVE-10119. The logging param for levels is only relevant to HS2, so for hive-cli users the hive.tez.exec.print.summary still makes sense. We can check for log-level param as well, in places we are checking value of hive.tez.exec.print.summary. Ie, consider hive.tez.exec.print.summary=true if log.level = PERFORMANCE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Add 'Reuben Kuhnert' as contributor.
Done. Welcome sircodesalot :) Looking forward to your contributions! On Fri, Apr 10, 2015 at 12:54 PM, Sergio Pena sergio.p...@cloudera.com wrote: Hi, Reuben has been trying to contact the dev list to add him as a Hive contributor, but for some reason his email is lost somewhere, and the list is not receiving his email. Could someone add him as a contributor and give him permissions to assign bugs to himself? His ID is a funny one :P sircodesalot Thanks, - Sergio
Re: Review Request 31041: HIVE-9645 : Fold expressions involving null
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31041/ --- (Updated April 10, 2015, 11:09 p.m.) Review request for hive. Changes --- Preserves type of folded null-constant. Bugs: HIVE-9645 https://issues.apache.org/jira/browse/HIVE-9645 Repository: hive-git Description --- Fold expressions involving null Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 14a1059 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ConstantVectorExpression.java c76b15b ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java b0768f2 ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java d18e1a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 513d030 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java 4cf6318 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArray.java 55a47fb ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCbrt.java 732ce8a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDecode.java 9858b4f ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFactorial.java ff63b1d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLastDay.java 6ead4be ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLevenshtein.java 4bba876 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMap.java 4234b76 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNextDay.java c0a0ab1 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSoundex.java ad72d05 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrunc.java 3bf8d34 ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLevenshtein.java 9f14ffd ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFNextDay.java 83ded3c ql/src/test/queries/clientpositive/optimize_nullscan.q 8e2ae04 ql/src/test/results/clientnegative/udf_add_months_error_1.q.out e128612 ql/src/test/results/clientnegative/udf_last_day_error_1.q.out 71376e2 ql/src/test/results/clientnegative/udf_next_day_error_1.q.out 1d9c25f ql/src/test/results/clientnegative/udf_next_day_error_2.q.out e23186a ql/src/test/results/clientpositive/annotate_stats_select.q.out 49c1a40 ql/src/test/results/clientpositive/decimal_udf.q.out 59b5643 ql/src/test/results/clientpositive/input8.q.out 47bac2c ql/src/test/results/clientpositive/input9.q.out 4666787 ql/src/test/results/clientpositive/load_dyn_part14.q.out ccf6f82 ql/src/test/results/clientpositive/num_op_type_conv.q.out 708fb51 ql/src/test/results/clientpositive/optimize_nullscan.q.out 609e415 ql/src/test/results/clientpositive/ppd_constant_expr.q.out 56813e4 ql/src/test/results/clientpositive/spark/auto_join8.q.out 5b6cc80 ql/src/test/results/clientpositive/spark/join8.q.out dcfbc3d ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out 66db7bd ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 ql/src/test/results/clientpositive/tez/optimize_nullscan.q.out 104654a ql/src/test/results/clientpositive/tez/vector_coalesce.q.out 2f7eb43 ql/src/test/results/clientpositive/tez/vector_decimal_udf.q.out 96f19ac ql/src/test/results/clientpositive/tez/vector_elt.q.out b27798a ql/src/test/results/clientpositive/udf4.q.out 1dfd7f8 ql/src/test/results/clientpositive/udf6.q.out 1de47ab ql/src/test/results/clientpositive/udf7.q.out e616fed ql/src/test/results/clientpositive/udf_case.q.out ed0aac0 ql/src/test/results/clientpositive/udf_coalesce.q.out 322dc4e ql/src/test/results/clientpositive/udf_elt.q.out f8acbf2 ql/src/test/results/clientpositive/udf_greatest.q.out 884095b ql/src/test/results/clientpositive/udf_hour.q.out 4eb5a00 ql/src/test/results/clientpositive/udf_if.q.out a2d2c08 ql/src/test/results/clientpositive/udf_instr.q.out 812f244 ql/src/test/results/clientpositive/udf_isnull_isnotnull.q.out a7d45ea ql/src/test/results/clientpositive/udf_least.q.out 95e3467 ql/src/test/results/clientpositive/udf_locate.q.out 1d10ecd ql/src/test/results/clientpositive/udf_minute.q.out ebd07c5 ql/src/test/results/clientpositive/udf_nvl.q.out 5042577 ql/src/test/results/clientpositive/udf_parse_url.q.out f657fa9 ql/src/test/results/clientpositive/udf_second.q.out fcd1143 ql/src/test/results/clientpositive/udf_size.q.out 95b8e61 ql/src/test/results/clientpositive/udf_trunc.q.out b9b2c48 ql/src/test/results/clientpositive/udf_when.q.out 52f15b3 ql/src/test/results/clientpositive/vector_coalesce.q.out 096ee22 ql/src/test/results/clientpositive/vector_decimal_udf.q.out
[jira] [Created] (HIVE-10307) Support to use number literals in partition column
Chaoyu Tang created HIVE-10307: -- Summary: Support to use number literals in partition column Key: HIVE-10307 URL: https://issues.apache.org/jira/browse/HIVE-10307 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as literals with postfix like Y, S, L, or BD appended to the number. These literals work in most Hive queries, but do not when they are used as partition column value. For a partitioned table like: create table partcoltypenum (key int, value string) partitioned by (tint tinyint, sint smallint, bint bigint); insert into partcoltypenum partition (tint=100Y, sint=1S, bint=1000L) select key, value from src limit 30; Queries like select, describe and drop partition do not work. For an example select * from partcoltypenum where tint=100Y and sint=1S and bint=1000L; does not return any rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Add to Developer List
done On Fri, Apr 10, 2015 at 12:36 PM, Reuben Kuhnert reuben.kuhn...@cloudera.com wrote: Hi, Can I be added to the Hive Developer List. My apache ID is 'sircodesalot'. Thank you
[jira] [Created] (HIVE-10308) Vectorization execution throws java.lang.IllegalArgumentException: Unsupported complex type: MAP
Selina Zhang created HIVE-10308: --- Summary: Vectorization execution throws java.lang.IllegalArgumentException: Unsupported complex type: MAP Key: HIVE-10308 URL: https://issues.apache.org/jira/browse/HIVE-10308 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.1, 0.14.0, 1.2.0, 1.1.0 Reporter: Selina Zhang Assignee: Selina Zhang Steps to reproduce: CREATE TABLE test_orc (a INT, b MAPINT, STRING) STORED AS ORC; INSERT OVERWRITE TABLE test_orc SELECT 1, MAP(1, one, 2, two) FROM src LIMIT 1; CREATE TABLE test(key INT) ; INSERT OVERWRITE TABLE test SELECT 1 FROM src LIMIT 1; set hive.vectorized.execution.enabled=true; set hive.auto.convert.join=false; select l.key from test l left outer join test_orc r on (l.key= r.a) where r.a is not null; Stack trace: Caused by: java.lang.IllegalArgumentException: Unsupported complex type: MAP at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:456) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1191) at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:58) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] ORC separate project
To Lefty's comment - Yes, anyone can take Apache code and make another project at will. However, for changes made to an existing project as part of that process, such as what Owen described for ORC in Hive, it is certainly something that Hive PMC can control or vote on. Nevertheless, that's not my immediate concern. To Owen's explanation - Thanks. I guess my major concern is that we seemingly are breaking apart Hive's integrity and making it hard to release and maintain due to increasing number of external dependents. Let's say that Hive depends on a certain version of ORC (as TLP) and it's found that ORC has a bug that seriously impacts Hive users. We cannot release Hive as fast as we can, since dong so would need ORC community to fix the problem and make a release, for which Hive PMC has no control. On the contrary, Hive community can quickly fix the problem and make a release without waiting for other projects to make a release. I'm not sure this move (ORC as TLP) will be beneficial to vast Hive users. If this not convincing, let me propose that we spin off metastore also as TLP tomorrow! Thanks, Xuefu On Wed, Apr 8, 2015 at 8:33 AM, Owen O'Malley omal...@apache.org wrote: On Tue, Apr 7, 2015 at 8:49 PM, Xuefu Zhang xzh...@cloudera.com wrote: If I understood Allen's #2 comment, we are moving existing ORC code out of Hive and make it a separate project, which I definitely missed. I'm sorry that wasn't clear. Yes, most of the code that is currently in org.apache.hadoop.hive.ql.io.orc will move to the new project. The biggest change on the Hive side will be to create a new Hive module that defines the API that storage formats like ORC need to code against if they want high performance integration with Hive's vectorization. I've started that jira at https://issues.apache.org/jira/browse/HIVE-10171 . Creating this API should help us create a clean interface for storage formats that will help ORC and other columnar formats like Trevni or Parquet. Once the ORC project has made its first release, we can create a Hive jira to replace the Hive ORC code with a reference to the ORC release jar. Since existing Hive PMC has governance on the code, I would expect it's still the case even after the spinoff. No, Apache doesn't allow umbrella projects where one PMC controls sub-projects. The reason is that the Apache board has found that controlling projects directly instead of indirectly through another PMC reduces the problems. .. Owen