Re: [DISCUSS] ORC separate project

2015-04-10 Thread Gopal Vijayaraghavan


On 4/10/15, 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote:

To Owen's explanation - Thanks. I guess my major concern is that we
seemingly are breaking apart Hive's integrity and making it hard to
release
and maintain due to increasing number of external dependents. Let's say
that Hive depends on a certain version of ORC (as TLP) and it's found that
ORC has a bug that seriously impacts Hive users. We cannot release Hive as
fast as we can, since dong so would need ORC community to fix the problem
and make a release, for which Hive PMC has no control. On the contrary,
Hive community can quickly fix the problem and make a release without
waiting for other projects to make a release. I'm not sure this move (ORC
as TLP) will be beneficial to vast Hive users.

You need to understand exactly what this brings about for Hive, in fact to
those who do not use ORC today.

With the proposed changes, competing formats like Parquet might be able to
compete with ORC in terms of hive features.

That is the direct impact of standardization of a Storage-API
implementation.

As an independent project, new ORC features cannot use the fact that it is
included in the ql/ source to introduce circular dependencies between
ql.exec - orc - ql.exec.vector classes.

As far as your concern for risks go, I would ask for a comparison against
the bugs/release cycles of ³STORED AS PARQUET².

As a Hive contributor, I¹m certain that if I find a core issue in Parquet,
my patches would be welcome there.

That should be beneficial to the Parquet community, but might not be
aligned entirely along employer lines, since my patch might be good, but
my intention would be to migrating warehouses with
parquet.hive.DeprecatedParquetInputFormat Impala tables to Hive.

Resolving that conflict should be ideally left to the Parquet IPMC  the
ASF rather than the Hive PMC (or let¹s do a bias check *to* Hive?).

Now - reverse that argument and replay it, except instead we¹re talking
about the C++ ORC reader plus a non-ASF SQL competitor to Hive.


If this not convincing, let me propose that we spin off metastore also as
TLP tomorrow!

http://incubator.apache.org/projects/hcatalog.html

Cheers,
Gopal




Add 'Reuben Kuhnert' as contributor.

2015-04-10 Thread Sergio Pena
Hi,

Reuben has been trying to contact the dev list to add him as a Hive
contributor, but for some reason his email is lost somewhere, and the list
is not receiving his email.

Could someone add him as a contributor and give him permissions to assign
bugs to himself?

His ID is a funny one :P

sircodesalot


Thanks,
- Sergio


[jira] [Created] (HIVE-10305) TestOrcFile has a mistake that makes metadata test ineffective

2015-04-10 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-10305:


 Summary: TestOrcFile has a mistake that makes metadata test 
ineffective
 Key: HIVE-10305
 URL: https://issues.apache.org/jira/browse/HIVE-10305
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Two of the values that are being stored as user metadata in 
TestOrcFile.metaData weren't flipped and thus were empty buffers. The test 
passes because they are compared to empty buffers. We should fix the test to 
perform the expected test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Add to Developer List

2015-04-10 Thread Reuben Kuhnert
Hi, Can I be added to the Hive Developer List. My apache ID is
'sircodesalot'.

Thank you


[jira] [Created] (HIVE-10306) We need to print tez summary when hive.server2.logging.level = PERFORMANCE.

2015-04-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-10306:


 Summary: We need to print tez summary when 
hive.server2.logging.level = PERFORMANCE. 
 Key: HIVE-10306
 URL: https://issues.apache.org/jira/browse/HIVE-10306
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


We need to print tez summary when hive.server2.logging.level = PERFORMANCE. We 
introduced this parameter via HIVE-10119.
The logging param for levels is only relevant to HS2, so for hive-cli users the 
hive.tez.exec.print.summary still makes sense. We can check for log-level param 
as well, in places we are checking value of hive.tez.exec.print.summary. Ie, 
consider hive.tez.exec.print.summary=true if log.level = PERFORMANCE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Add 'Reuben Kuhnert' as contributor.

2015-04-10 Thread Thejas Nair
Done.
Welcome sircodesalot :)
Looking forward to your contributions!


On Fri, Apr 10, 2015 at 12:54 PM, Sergio Pena sergio.p...@cloudera.com wrote:
 Hi,

 Reuben has been trying to contact the dev list to add him as a Hive
 contributor, but for some reason his email is lost somewhere, and the list
 is not receiving his email.

 Could someone add him as a contributor and give him permissions to assign
 bugs to himself?

 His ID is a funny one :P

 sircodesalot


 Thanks,
 - Sergio


Re: Review Request 31041: HIVE-9645 : Fold expressions involving null

2015-04-10 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31041/
---

(Updated April 10, 2015, 11:09 p.m.)


Review request for hive.


Changes
---

Preserves type of folded null-constant.


Bugs: HIVE-9645
https://issues.apache.org/jira/browse/HIVE-9645


Repository: hive-git


Description
---

Fold expressions involving null


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
14a1059 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ConstantVectorExpression.java
 c76b15b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
 b0768f2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
 d18e1a7 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 513d030 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java 
4cf6318 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArray.java 
55a47fb 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCbrt.java 732ce8a 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDecode.java 
9858b4f 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFactorial.java 
ff63b1d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLastDay.java 
6ead4be 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLevenshtein.java 
4bba876 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMap.java 4234b76 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNextDay.java 
c0a0ab1 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSoundex.java 
ad72d05 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrunc.java 
3bf8d34 
  
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLevenshtein.java
 9f14ffd 
  ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFNextDay.java 
83ded3c 
  ql/src/test/queries/clientpositive/optimize_nullscan.q 8e2ae04 
  ql/src/test/results/clientnegative/udf_add_months_error_1.q.out e128612 
  ql/src/test/results/clientnegative/udf_last_day_error_1.q.out 71376e2 
  ql/src/test/results/clientnegative/udf_next_day_error_1.q.out 1d9c25f 
  ql/src/test/results/clientnegative/udf_next_day_error_2.q.out e23186a 
  ql/src/test/results/clientpositive/annotate_stats_select.q.out 49c1a40 
  ql/src/test/results/clientpositive/decimal_udf.q.out 59b5643 
  ql/src/test/results/clientpositive/input8.q.out 47bac2c 
  ql/src/test/results/clientpositive/input9.q.out 4666787 
  ql/src/test/results/clientpositive/load_dyn_part14.q.out ccf6f82 
  ql/src/test/results/clientpositive/num_op_type_conv.q.out 708fb51 
  ql/src/test/results/clientpositive/optimize_nullscan.q.out 609e415 
  ql/src/test/results/clientpositive/ppd_constant_expr.q.out 56813e4 
  ql/src/test/results/clientpositive/spark/auto_join8.q.out 5b6cc80 
  ql/src/test/results/clientpositive/spark/join8.q.out dcfbc3d 
  ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out 66db7bd 
  ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
  ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
  ql/src/test/results/clientpositive/tez/optimize_nullscan.q.out 104654a 
  ql/src/test/results/clientpositive/tez/vector_coalesce.q.out 2f7eb43 
  ql/src/test/results/clientpositive/tez/vector_decimal_udf.q.out 96f19ac 
  ql/src/test/results/clientpositive/tez/vector_elt.q.out b27798a 
  ql/src/test/results/clientpositive/udf4.q.out 1dfd7f8 
  ql/src/test/results/clientpositive/udf6.q.out 1de47ab 
  ql/src/test/results/clientpositive/udf7.q.out e616fed 
  ql/src/test/results/clientpositive/udf_case.q.out ed0aac0 
  ql/src/test/results/clientpositive/udf_coalesce.q.out 322dc4e 
  ql/src/test/results/clientpositive/udf_elt.q.out f8acbf2 
  ql/src/test/results/clientpositive/udf_greatest.q.out 884095b 
  ql/src/test/results/clientpositive/udf_hour.q.out 4eb5a00 
  ql/src/test/results/clientpositive/udf_if.q.out a2d2c08 
  ql/src/test/results/clientpositive/udf_instr.q.out 812f244 
  ql/src/test/results/clientpositive/udf_isnull_isnotnull.q.out a7d45ea 
  ql/src/test/results/clientpositive/udf_least.q.out 95e3467 
  ql/src/test/results/clientpositive/udf_locate.q.out 1d10ecd 
  ql/src/test/results/clientpositive/udf_minute.q.out ebd07c5 
  ql/src/test/results/clientpositive/udf_nvl.q.out 5042577 
  ql/src/test/results/clientpositive/udf_parse_url.q.out f657fa9 
  ql/src/test/results/clientpositive/udf_second.q.out fcd1143 
  ql/src/test/results/clientpositive/udf_size.q.out 95b8e61 
  ql/src/test/results/clientpositive/udf_trunc.q.out b9b2c48 
  ql/src/test/results/clientpositive/udf_when.q.out 52f15b3 
  ql/src/test/results/clientpositive/vector_coalesce.q.out 096ee22 
  ql/src/test/results/clientpositive/vector_decimal_udf.q.out 

[jira] [Created] (HIVE-10307) Support to use number literals in partition column

2015-04-10 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-10307:
--

 Summary: Support to use number literals in partition column
 Key: HIVE-10307
 URL: https://issues.apache.org/jira/browse/HIVE-10307
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang


Data types like TinyInt, SmallInt, BigInt or Decimal can be expressed as 
literals with postfix like Y, S, L, or BD appended to the number. These 
literals work in most Hive queries, but do not when they are used as partition 
column value. For a partitioned table like:
create table partcoltypenum (key int, value string) partitioned by (tint 
tinyint, sint smallint, bint bigint);
insert into partcoltypenum partition (tint=100Y, sint=1S, 
bint=1000L) select key, value from src limit 30;

Queries like select, describe and drop partition do not work. For an example
select * from partcoltypenum where tint=100Y and sint=1S and 
bint=1000L;
does not return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Add to Developer List

2015-04-10 Thread Thejas Nair
done

On Fri, Apr 10, 2015 at 12:36 PM, Reuben Kuhnert
reuben.kuhn...@cloudera.com wrote:
 Hi, Can I be added to the Hive Developer List. My apache ID is
 'sircodesalot'.

 Thank you


[jira] [Created] (HIVE-10308) Vectorization execution throws java.lang.IllegalArgumentException: Unsupported complex type: MAP

2015-04-10 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-10308:
---

 Summary: Vectorization execution throws 
java.lang.IllegalArgumentException: Unsupported complex type: MAP
 Key: HIVE-10308
 URL: https://issues.apache.org/jira/browse/HIVE-10308
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.1, 0.14.0, 1.2.0, 1.1.0
Reporter: Selina Zhang
Assignee: Selina Zhang


Steps to reproduce:

CREATE TABLE test_orc (a INT, b MAPINT, STRING) STORED AS ORC;
INSERT OVERWRITE TABLE test_orc SELECT 1, MAP(1, one, 2, two) FROM src 
LIMIT 1;
CREATE TABLE test(key INT) ;
INSERT OVERWRITE TABLE test SELECT 1 FROM src LIMIT 1;

set hive.vectorized.execution.enabled=true;
set hive.auto.convert.join=false;

select l.key from test l left outer join test_orc r on (l.key= r.a) where r.a 
is not null;

Stack trace:

Caused by: java.lang.IllegalArgumentException: Unsupported complex type: MAP
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:456)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1191)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:58)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:198)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] ORC separate project

2015-04-10 Thread Xuefu Zhang
To Lefty's comment -  Yes, anyone can take Apache code and make another
project at will. However, for changes made to an existing project as part
of that process, such as what Owen described for ORC in Hive, it is
certainly something that Hive PMC can control or vote on. Nevertheless,
that's not my immediate concern.

To Owen's explanation - Thanks. I guess my major concern is that we
seemingly are breaking apart Hive's integrity and making it hard to release
and maintain due to increasing number of external dependents. Let's say
that Hive depends on a certain version of ORC (as TLP) and it's found that
ORC has a bug that seriously impacts Hive users. We cannot release Hive as
fast as we can, since dong so would need ORC community to fix the problem
and make a release, for which Hive PMC has no control. On the contrary,
Hive community can quickly fix the problem and make a release without
waiting for other projects to make a release. I'm not sure this move (ORC
as TLP) will be beneficial to vast Hive users.

If this not convincing, let me propose that we spin off metastore also as
TLP tomorrow!

Thanks,
Xuefu


On Wed, Apr 8, 2015 at 8:33 AM, Owen O'Malley omal...@apache.org wrote:

 On Tue, Apr 7, 2015 at 8:49 PM, Xuefu Zhang xzh...@cloudera.com wrote:

  If I understood Allen's #2 comment, we are moving existing ORC code out
 of
  Hive and make it a separate project, which I definitely missed.
 

 I'm sorry that wasn't clear. Yes, most of the code that is currently in
 org.apache.hadoop.hive.ql.io.orc will move to the new project.

 The biggest change on the Hive side will be to create a new Hive module
 that defines the API that storage formats like ORC need to code against if
 they want high performance integration with Hive's vectorization. I've
 started that jira at https://issues.apache.org/jira/browse/HIVE-10171 .
 Creating this API should help us create a clean interface for storage
 formats that will help ORC and other columnar formats like Trevni or
 Parquet.

 Once the ORC project has made its first release, we can create a Hive jira
 to replace the Hive ORC code with a reference to the ORC release jar.


  Since existing Hive PMC has governance on the code, I would expect it's
  still the case even after the spinoff.
 

 No, Apache doesn't allow umbrella projects where one PMC controls
 sub-projects. The reason is that the Apache board has found that
 controlling projects directly instead of indirectly through another PMC
 reduces the problems.

 .. Owen