[jira] [Created] (HIVE-10235) Loop optimization for SIMD in ColumnDivideColumn.txt

2015-04-07 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-10235:


 Summary: Loop optimization for SIMD in ColumnDivideColumn.txt
 Key: HIVE-10235
 URL: https://issues.apache.org/jira/browse/HIVE-10235
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Affects Versions: 1.1.0
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor


Found two loop which could be optimized for packed instruction set during 
execution.
1. hasDivBy0 depends on the result of last loop, which prevent the loop be 
executed vectorized.
{code:java}
for(int i = 0; i != n; i++) {
  OperandType2 denom = vector2[i];
  outputVector[i] = vector1[0] OperatorSymbol denom;
  hasDivBy0 = hasDivBy0 || (denom == 0);
}
{code}
2. same as HIVE-10180, vector2\[0\] reference provent JVM optimizing loop into 
packed instruction set.
{code:java}
for(int i = 0; i != n; i++) {
  outputVector[i] = vector1[i] OperatorSymbol vector2[0];
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10236) LLAP: Certain errors are not reported to the AM when a fragment fails

2015-04-07 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10236:
-

 Summary: LLAP: Certain errors are not reported to the AM when a 
fragment fails
 Key: HIVE-10236
 URL: https://issues.apache.org/jira/browse/HIVE-10236
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32918: HIVE-10180 Loop optimization for SIMD in ColumnArithmeticColumn.txt

2015-04-07 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32918/
---

(Updated 四月 7, 2015, 7:24 a.m.)


Review request for hive.


Changes
---

mark variables as final.


Bugs: Hive-10180
https://issues.apache.org/jira/browse/Hive-10180


Repository: hive


Description
---

JVM is quite strict on the code schema which may executed with SIMD 
instructions, take a loop in DoubleColAddDoubleColumn.java for example,
for (int i = 0; i != n; i++) {
  outputVector[i] = vector1[0] + vector2[i];
}
The vector1[0] reference would prevent JVM to execute this part of code with 
vectorized instructions, we need to assign the vector1[0] to a variable 
outside of loop, and use that variable in loop.


Diffs (updated)
-

  trunk/ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt 
1671736 

Diff: https://reviews.apache.org/r/32918/diff/


Testing
---


Thanks,

chengxiang li



[jira] [Created] (HIVE-10237) create external table, location path contains space ,like '/user/hive/warehouse/custom.db/uigs_kmap '

2015-04-07 Thread xiaowei wang (JIRA)
xiaowei wang created HIVE-10237:
---

 Summary: create external table, location  path contains space 
,like '/user/hive/warehouse/custom.db/uigs_kmap ' 
 Key: HIVE-10237
 URL: https://issues.apache.org/jira/browse/HIVE-10237
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.1
 Environment: Hadoop 2.3.0-cdh5.0.0 
hive 0.13.1
Reporter: xiaowei wang


when i want to create a external table and give the table a location ,i write a 
wront location path, /user/hive/warehouse/custom.db/uigs_kmap  ,which 
contains a space at the end of the path。 I think hive will trim the space of 
the location,but it does not。



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32920: HIVE-10189: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization

2015-04-07 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32920/
---

(Updated April 7, 2015, 6:06 a.m.)


Review request for hive and chengxiang li.


Summary (updated)
-

HIVE-10189: Create a micro benchmark tool for vectorization to evaluate the 
performance gain after SIMD optimization


Repository: hive-git


Description
---

Add microbenchmark tool to show performance improvement by JMH


Diffs
-

  
itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/32920/diff/


Testing
---


Thanks,

cheng xu



Review Request 32920: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization

2015-04-07 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32920/
---

Review request for hive and chengxiang li.


Repository: hive-git


Description
---

Add microbenchmark tool to show performance improvement by JMH


Diffs
-

  
itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/32920/diff/


Testing
---


Thanks,

cheng xu



Re: ORC separate project

2015-04-07 Thread Brock Noland
Hey guys,

Good discussion here. One point of order, I feel like this should be a
[DISCUSS] thread. Some folks filter on that specific text as it's
quite standard in Apache to use that subject prefix for big issues
like this one.

Brock

On Fri, Apr 3, 2015 at 3:59 PM, Thejas Nair thejas.n...@gmail.com wrote:
 On Fri, Apr 3, 2015 at 1:25 PM, Lefty Leverenz leftylever...@gmail.com
 wrote:

 Hive users who wished to use ORC would obviously need to pull in ORC
 artifacts in addition to Hive.


 What would happen with Hive features that (currently) only work with ORC?
 Would they be extended to work with other file formats and stay in Hive?
 What about future features -- would they have to work with multiple file
 formats from the get-go?



 The storage-api module proposed above would lead to clearer storage
 interfaces in hive. That will in turn help to implement such features using
 other storage including parquet, hbase etc.
 The result of this work will not automatically make those features worth
 with ORC, somebody would need to do that.

 Whether future features would work for all formats would depend on whether
 the new feature needs new functionality to be supported by the storage
 layer. If the feature needs new storage functionality, I would expect new
 interfaces to be defined in hive, and then implemented by the storage
 engines that want to support that feature.

 This will not negatively impact experience of users with respect to ORC or
 other storage formats. The way we package parquet in hive, we can package
 ORC as well. In fact, users would be more easily be able to upgrade their
 version of ORC being used, as releases can happen independent of each other.


Re: ORC separate project

2015-04-07 Thread Lefty Leverenz
Is there a way to change this to a DISCUSS thread?  Or could everything be
copied into a new thread?  Or just start a new thread with a reference to
this one?

-- Lefty

On Tue, Apr 7, 2015 at 2:26 AM, Brock Noland br...@apache.org wrote:

 Hey guys,

 Good discussion here. One point of order, I feel like this should be a
 [DISCUSS] thread. Some folks filter on that specific text as it's
 quite standard in Apache to use that subject prefix for big issues
 like this one.

 Brock

 On Fri, Apr 3, 2015 at 3:59 PM, Thejas Nair thejas.n...@gmail.com wrote:
  On Fri, Apr 3, 2015 at 1:25 PM, Lefty Leverenz leftylever...@gmail.com
  wrote:
 
  Hive users who wished to use ORC would obviously need to pull in ORC
  artifacts in addition to Hive.
 
 
  What would happen with Hive features that (currently) only work with
 ORC?
  Would they be extended to work with other file formats and stay in Hive?
  What about future features -- would they have to work with multiple file
  formats from the get-go?
 
 
 
  The storage-api module proposed above would lead to clearer storage
  interfaces in hive. That will in turn help to implement such features
 using
  other storage including parquet, hbase etc.
  The result of this work will not automatically make those features worth
  with ORC, somebody would need to do that.
 
  Whether future features would work for all formats would depend on
 whether
  the new feature needs new functionality to be supported by the storage
  layer. If the feature needs new storage functionality, I would expect new
  interfaces to be defined in hive, and then implemented by the storage
  engines that want to support that feature.
 
  This will not negatively impact experience of users with respect to ORC
 or
  other storage formats. The way we package parquet in hive, we can package
  ORC as well. In fact, users would be more easily be able to upgrade their
  version of ORC being used, as releases can happen independent of each
 other.



Re: Review Request 32920: HIVE-10189: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization

2015-04-07 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32920/#review79136
---



itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java
https://reviews.apache.org/r/32920/#comment128267

The benchmark look good, my only concern is that how could we expand this 
benchmark to other expressions?


- chengxiang li


On April 7, 2015, 6:06 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32920/
 ---
 
 (Updated April 7, 2015, 6:06 a.m.)
 
 
 Review request for hive and chengxiang li.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Add microbenchmark tool to show performance improvement by JMH
 
 
 Diffs
 -
 
   
 itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/32920/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 cheng xu
 




[jira] [Created] (HIVE-10238) Loop optimization for SIMD in IfExprColumnColumn.txt

2015-04-07 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-10238:


 Summary: Loop optimization for SIMD in IfExprColumnColumn.txt
 Key: HIVE-10238
 URL: https://issues.apache.org/jira/browse/HIVE-10238
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Affects Versions: 1.1.0
Reporter: Chengxiang Li
Assignee: Jitendra Nath Pandey
Priority: Minor


The ?: operator as following could not be vectorized in loop, we may transfer 
it into mathematical expression.
{code:java}
for(int j = 0; j != n; j++) {
  int i = sel[j];
  outputVector[i] = (vector1[i] == 1 ? vector2[i] : vector3[i]);
  outputIsNull[i] = (vector1[i] == 1 ?
  arg2ColVector.isNull[i] : arg3ColVector.isNull[i]);
}
{code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Re: hive 0.14 on some platform return some not NULL value as NULL

2015-04-07 Thread r7raul1...@163.com
I use hive 0.14 to use hive 0.10 metastroe server .The problem fixed. Now hive 
0.14 return correct result.



r7raul1...@163.com
 
From: r7raul1...@163.com
Date: 2015-04-07 10:34
To: dev
CC: thejas.nair
Subject: Re: Re: hive 0.14 on some platform return some not NULL value as NULL
 
 
I found difference form log:
In hive 0.14 
DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, 
city_id, landing_page_type_id, landing_track_time, landing_url, 
nav_refer_tracker_id, nav_refer_page_type_id, nav_refer_page_value, 
nav_refer_link_position, nav_tracker_id, nav_page_categ_id, nav_page_type_id, 
nav_page_value, nav_srce_type, internal_keyword, internal_result_sum, pltfm_id, 
app_vers, nav_link_position, nav_button_position, nav_track_time, 
nav_next_tracker_id, sessn_last_time, sessn_pv, detl_tracker_id, 
detl_page_type_id, detl_page_value, detl_pm_id, detl_link_position, 
detl_position_track_id, cart_tracker_id, cart_page_type_id, cart_page_value, 
cart_link_postion, cart_button_position, cart_position_track_id, cart_prod_id, 
ordr_tracker_id, ordr_page_type_id, ordr_code, updt_time, cart_pm_id, 
brand_code, categ_type, os, end_user_id, add_cart_flag, navgation_page_flag, 
nav_page_url, detl_button_position, manul_flag, manul_track_date, 
nav_refer_tpa, nav_refer_tpa_id, nav_refer_tpc, nav_refer_tpi, nav_refer_tcs, 
nav_refer_tcsa, nav_refer_tcdt, nav_refer_tcd, nav_refer_tci, 
nav_refer_postn_type, nav_tpa_id, nav_tpa, nav_tpc, nav_tpi, nav_tcs, nav_tcsa, 
nav_tcdt, nav_tcd, nav_tci, nav_postn_type, detl_tpa_id, detl_tpa, detl_tpc, 
detl_tpi, detl_tcs, detl_tcsa, detl_tcdt, detl_tcd, detl_tci, detl_postn_type, 
cart_tpa_id, cart_tpa, cart_tpc, cart_tpi, cart_tcs, cart_tcsa, cart_tcdt, 
cart_tcd, cart_tci, cart_postn_type] columnTypes=[string, bigint, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, int, string, string, 
string, string, string, string, int, string, string, string, bigint, string, 
string, string, string, string, string, string, string, bigint, string, string, 
string, string, bigint, string, int, string, string, string, int, string, 
string, int, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string] separator=[[B@e50bca4] nullstring=\N 
lastColumnTakesRest=false 
 
In hive 0.10 
DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, 
city_id, landing_page_type_id, landing_track_time, landing_url, 
nav_refer_tracker_id, nav_refer_page_type_id, nav_refer_page_value, 
nav_refer_link_position, nav_tracker_id, nav_page_categ_id, nav_page_type_id, 
nav_page_value, nav_srce_type, internal_keyword, internal_result_sum, pltfm_id, 
app_vers, nav_link_position, nav_button_position, nav_track_time, 
nav_next_tracker_id, sessn_last_time, sessn_pv, detl_tracker_id, 
detl_page_type_id, detl_page_value, detl_pm_id, detl_link_position, 
detl_position_track_id, cart_tracker_id, cart_page_type_id, cart_page_value, 
cart_link_postion, cart_button_position, cart_position_track_id, cart_prod_id, 
ordr_tracker_id, ordr_page_type_id, ordr_code, updt_time, cart_pm_id, 
brand_code, categ_type, os, end_user_id, add_cart_flag, navgation_page_flag, 
nav_page_url, detl_button_position, manul_flag, manul_track_date, 
nav_refer_tpa, nav_refer_tpa_id, nav_refer_tpc, nav_refer_tpi, nav_refer_tcs, 
nav_refer_tcsa, nav_refer_tcdt, nav_refer_tcd, nav_refer_tci, 
nav_refer_postn_type, nav_tpa_id, nav_tpa, nav_tpc, nav_tpi, nav_tcs, nav_tcsa, 
nav_tcdt, nav_tcd, nav_tci, nav_postn_type, detl_tpa_id, detl_tpa, detl_tpc, 
detl_tpi, detl_tcs, detl_tcsa, detl_tcdt, detl_tcd, detl_tci, detl_postn_type, 
cart_tpa_id, cart_tpa, cart_tpc, cart_tpi, cart_tcs, cart_tcsa, cart_tcdt, 
cart_tcd, cart_tci, cart_postn_type, sessn_chanl_id, gu_sec_flg, 
detl_refer_page_type_id, detl_refer_page_value, detl_event_id, 
nav_refer_intrn_reslt_sum, nav_intrn_reslt_sum, nav_refer_intrn_kw, 
nav_intrn_kw, detl_track_time, cart_track_time] columnTypes=[string, bigint, 
string, string, string, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, string, int, string, 
string, string, string, string, string, int, string, string, string, bigint, 
string, string, string, string, string, string, string, string, bigint, string, 
string, string, string, bigint, string, int, string, string, string, int, 
string, string, int, string, string, string, string, string, string, string, 
string, string, string, string, string, string, string, 

[jira] [Created] (HIVE-10239) Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL

2015-04-07 Thread Naveen Gangam (JIRA)
Naveen Gangam created HIVE-10239:


 Summary: Create scripts to do metastore upgrade tests on jenkins 
for Derby, Oracle and PostgreSQL
 Key: HIVE-10239
 URL: https://issues.apache.org/jira/browse/HIVE-10239
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam


Need to create DB-implementation specific scripts to use the framework 
introduced in HIVE-9800 to have any metastore schema changes tested across all 
supported databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] hive pull request: Update HiveDatabaseMetaData.java change the ide...

2015-04-07 Thread Jeffrio
GitHub user Jeffrio opened a pull request:

https://github.com/apache/hive/pull/31

Update HiveDatabaseMetaData.java change the identifierQuoteString

according to this jira https://issues.apache.org/jira/browse/HIVE-6013
hive use the backstick as the quotestring
so, I think the getIdentifierQuoteString() function should return the 
backstick rather than the space

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Jeffrio/hive patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/31.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #31


commit 5ac637c83615aa389db49ce169c0df0461619c63
Author: Jeffrio corej...@163.com
Date:   2015-04-07T16:35:11Z

Update HiveDatabaseMetaData.java change the identifierQuoteString

according to this jira https://issues.apache.org/jira/browse/HIVE-6013
hive use the backstick as the quotestring
so, I think the getIdentifierQuoteString() function should return the 
backstick rather than the space




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 32809: Disallow create table with dot/colon in column name

2015-04-07 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32809/#review79229
---

Ship it!


Ship It!

- John Pullokkaran


On April 7, 2015, 6:18 p.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32809/
 ---
 
 (Updated April 7, 2015, 6:18 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Since we don't allow users to query column names with dot in the middle such 
 as emp.no, don't allow users to create tables with such columns that cannot 
 be queried. Fix the documentation to reflect this fix.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/32809/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong
 




[jira] [Created] (HIVE-10241) ACID: drop table doesn't acquire any locks

2015-04-07 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-10241:
-

 Summary: ACID: drop table doesn't acquire any locks
 Key: HIVE-10241
 URL: https://issues.apache.org/jira/browse/HIVE-10241
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


with Hive configured to use DbTxnManager, in DbTxnManager.acquireLocks() both 
plan.getInputs() and plan.getOutputs() are empty when drop table foo is 
executed and thus no locks are acquired.  We should be acquiring X locks to 
make sure any readers of this table don't get data wiped out while read is in 
progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32809: Disallow create table with dot/colon in column name

2015-04-07 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32809/
---

(Updated April 7, 2015, 6:18 p.m.)


Review request for hive, Ashutosh Chauhan and John Pullokkaran.


Changes
---

thanks for Swarnim Kulkarni's comments. I tried to answer and address them.


Repository: hive-git


Description
---

Since we don't allow users to query column names with dot in the middle such as 
emp.no, don't allow users to create tables with such columns that cannot be 
queried. Fix the documentation to reflect this fix.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da 
  
ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/32809/diff/


Testing
---


Thanks,

pengcheng xiong



Re: Review Request 32370: HIVE-10040

2015-04-07 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32370/#review79228
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java
https://reviews.apache.org/r/32370/#comment128449

As we discussed:
1. Move the supported JoinAlgorithm to Sub Class (i.e target exec engine)
2. Move Cost Computation to sub class/target exec engine
3. This logic here should consult target exec engine for supported 
algorithms, iterate through them and find the cheapest one with out actually 
knowing anything about algorithm itself.


- John Pullokkaran


On April 6, 2015, 9:30 p.m., Jesús Camacho Rodríguez wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32370/
 ---
 
 (Updated April 6, 2015, 9:30 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Bugs: HIVE-10040
 https://issues.apache.org/jira/browse/HIVE-10040
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
 7adb38342bfaf72f152a16006bc0bfecbb28f5ed 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
  977313a5a632329fc963daf7ff276ccdd59ce7c5 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 
 41604cd0af68e7f90296fa271c42debc5aaf743a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
  9a8a5da81b92c7c1f33d1af8072b1fb94e237290 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java
  3e45a3fbed3265b126a3ff9b6ffe44bee24453ef 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java
  f411d9029cf244b66ef1d1591ea55f11f7cb9d27 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java
  5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java
  6c215c96190f0fcebe063b15c2763c49ebf1faaf 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java
  fcf09a5de0e318c6fb69664a8dd618f2d9ae84e5 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java
  4984683c3c8c6c0378a22e21fd6d961f3901f25c 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java
  f846dd19899af51194f3407ef913fcb9bcc24977 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java
  dabbe280278dc80f00f0240a0c615fe6c7b8533a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
  95515b23e409d73d5c61e107931727add3f992a6 
 
 Diff: https://reviews.apache.org/r/32370/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jesús Camacho Rodríguez
 




[jira] [Created] (HIVE-10242) ACID: insert overwrite prevents create table command

2015-04-07 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-10242:
-

 Summary: ACID: insert overwrite prevents create table command
 Key: HIVE-10242
 URL: https://issues.apache.org/jira/browse/HIVE-10242
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


1. insert overwirte table DB.T1 select ... from T2: this takes X lock on DB.T1 
and S lock on T2.
X lock makes sense because we don't want anyone reading T1 while it's 
overwritten. S lock on T2 prevents if from being dropped while the query is in 
progress.
2. create table DB.T3: takes S lock on DB.
This S lock gets blocked by X lock on T1. S lock prevents the DB from being 
dropped while create table is executed.

If the insert statement is long running, this blocks DDL ops on the same 
database.  This is a usability issue.  
There is no good reason why X lock on a table within a DB and S lock on DB 
should be in conflict.  

(this is different from a situation where X lock is on a partition and S lock 
is on the table to which this partition belongs.  Here it makes sense.  
Basically there is no SQL way to address all tables in a DB but you can easily 
refer to all partitions of a table)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32809: Disallow create table with dot/colon in column name

2015-04-07 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32809/#review79203
---



ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
https://reviews.apache.org/r/32809/#comment128410

Please address Swarnism's comments


- John Pullokkaran


On April 3, 2015, 6:24 a.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32809/
 ---
 
 (Updated April 3, 2015, 6:24 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Since we don't allow users to query column names with dot in the middle such 
 as emp.no, don't allow users to create tables with such columns that cannot 
 be queried. Fix the documentation to reflect this fix.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/32809/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong
 




Re: Review Request 32809: Disallow create table with dot/colon in column name

2015-04-07 Thread pengcheng xiong


 On April 7, 2015, 4:45 a.m., Swarnim Kulkarni wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 632
  https://reviews.apache.org/r/32809/diff/1/?file=914560#file914560line632
 
  If you choose to use a String.contains, this could as well be a 
  character array.

Thanks for your comment. But I assume that char is enough for my purpose.


 On April 7, 2015, 4:45 a.m., Swarnim Kulkarni wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 634
  https://reviews.apache.org/r/32809/diff/1/?file=914560#file914560line634
 
  Why not simply use string.contains here?

The contains method is implemented using a call to indexOf, so they are 
essentially the same.

public boolean contains(CharSequence s) {
return indexOf(s.toString())  -1;
}

But in my case, I just would like to check if a string contains a char, rather 
than a CharSequence. Thus, I think indexOf would be better


 On April 7, 2015, 4:45 a.m., Swarnim Kulkarni wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 635
  https://reviews.apache.org/r/32809/diff/1/?file=914560#file914560line635
 
  This and the following line can be simplified as 
  
  return input.indexOf(c);

The purpose of the function is to test whether a string contains a char. The 
actual index is only used to check if it is there, the detailed position 
information is not needed. That is to say, a boolean return value is enough for 
my purpose.


- pengcheng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32809/#review79121
---


On April 3, 2015, 6:24 a.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32809/
 ---
 
 (Updated April 3, 2015, 6:24 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Since we don't allow users to query column names with dot in the middle such 
 as emp.no, don't allow users to create tables with such columns that cannot 
 be queried. Fix the documentation to reflect this fix.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/32809/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong
 




Re: Review Request 32370: HIVE-10040

2015-04-07 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32370/
---

(Updated April 7, 2015, 7:12 p.m.)


Review request for hive and John Pullokkaran.


Changes
---

Address John's comments.


Bugs: HIVE-10040
https://issues.apache.org/jira/browse/HIVE-10040


Repository: hive-git


Description
---

CBO (Calcite Return Path): Pluggable cost modules [CBO branch]


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
7adb38342bfaf72f152a16006bc0bfecbb28f5ed 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
 977313a5a632329fc963daf7ff276ccdd59ce7c5 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 
41604cd0af68e7f90296fa271c42debc5aaf743a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java
 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java
 f411d9029cf244b66ef1d1591ea55f11f7cb9d27 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java
 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java
 6c215c96190f0fcebe063b15c2763c49ebf1faaf 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java
 c8e9b52258eb209535a2bbfe512cf2d04178cb4b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java
 4984683c3c8c6c0378a22e21fd6d961f3901f25c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java
 f846dd19899af51194f3407ef913fcb9bcc24977 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java
 207f402013b5c5b2d4ada5493122427ddce9270d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java
 95c2be50c07f0ed6da373425e60f1185cb2cfe2b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java
 dabbe280278dc80f00f0240a0c615fe6c7b8533a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
 95515b23e409d73d5c61e107931727add3f992a6 

Diff: https://reviews.apache.org/r/32370/diff/


Testing
---


Thanks,

Jesús Camacho Rodríguez



Re: Review Request 32406: Add another level of explain for RDBMS audience

2015-04-07 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32406/#review79199
---

Ship it!


Ship It!

- John Pullokkaran


On April 7, 2015, 12:42 a.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32406/
 ---
 
 (Updated April 7, 2015, 12:42 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Current Hive Explain (default) is targeted at MR Audience. We need a new 
 level of explain plan to be targeted at RDBMS audience. The explain requires 
 these:
 1) The focus needs to be on what part of the query is being executed rather 
 than internals of the engines
 2) There needs to be a clearly readable tree of operations
 3) Examples - Table scan should mention the table being scanned, the Sarg, 
 the size of table and expected cardinality after the Sarg'ed read. The join 
 should mention the table being joined with and the join condition. The 
 aggregate should mention the columns in the group-by.
 
 
 Diffs
 -
 
   common/pom.xml 5b0e78c 
   common/src/java/org/apache/hadoop/hive/common/jsonexplain/JsonParser.java 
 PRE-CREATION 
   
 common/src/java/org/apache/hadoop/hive/common/jsonexplain/JsonParserFactory.java
  PRE-CREATION 
   common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Attr.java 
 PRE-CREATION 
   
 common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Connection.java 
 PRE-CREATION 
   common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Op.java 
 PRE-CREATION 
   common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Stage.java 
 PRE-CREATION 
   
 common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/TezJsonParser.java
  PRE-CREATION 
   common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Vertex.java 
 PRE-CREATION 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cc16c38 
   itests/src/test/resources/testconfiguration.properties 288270e 
   ql/src/java/org/apache/hadoop/hive/ql/Context.java 0f7da53 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 149f911 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java e572338 
   ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanWork.java 
 095afd4 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateWork.java
  092f627 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/CalciteSemanticException.java
  a71cd35 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java
  6c215c9 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java
  29134a4 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
  5c0616e 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
  8c3587e 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTablePartMergeFilesDesc.java 
 eaf3dc4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java c8bf7dc 
   ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 
 38b6d96 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1f6d53d 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 
 9834fc8 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterDatabaseDesc.java e45bc26 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterIndexDesc.java db2cf7f 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java 24cf1da 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ArchiveWork.java 9fb5c8b 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 6ab75a7 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BucketMapJoinContext.java 
 f436bc0 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CollectDesc.java 588e14d 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDesc.java a44c8e8 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java 
 d644155 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsWork.java 3cae727 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CommonMergeJoinDesc.java 2354139 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CopyWork.java 3353384 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateDatabaseDesc.java a6b52aa 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateFunctionDesc.java dce5ece 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateMacroDesc.java 3c5a723 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 8cadb96 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableLikeDesc.java 3dad4ab 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java dd76a82 
   

[jira] [Created] (HIVE-10240) Patch HIVE-9473 breaks KERBEROS

2015-04-07 Thread Olaf Flebbe (JIRA)
Olaf Flebbe created HIVE-10240:
--

 Summary: Patch HIVE-9473 breaks KERBEROS
 Key: HIVE-10240
 URL: https://issues.apache.org/jira/browse/HIVE-10240
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 1.0.0
Reporter: Olaf Flebbe
 Fix For: 1.0.1


The patch from HIVE-9473 introduces a regression. Hive-Server2 does not start 
properly any more for our config (more or less the bigtop environment)

sql std auth enabled, enableDoAs disabled, tez enabled, kerberos enabled.

Problem seems to be that the kerberos ticket is not present when hive-server2 
tries first to access HDFS. When HIVE-9473 is reverted getting the ticket is 
one of the first things hive-server2 does.


Posting startup of vanilla hive-1.0.0 and startup of a hive-1.0.0 with this 
commit revoked, where hive-server2 correctly starts.
{code}
commit 35582c2065a6b90b003a656bdb3b0ff08b0c35b9
Author: Thejas Nair the...@apache.org
Date:   Fri Jan 30 00:05:50 2015 +

HIVE-9473 : sql std auth should disallow built-in udfs that allow any java 
methods to be called (Thejas Nair, reviewed by Jason Dere)

git-svn-id: 
https://svn.apache.org/repos/asf/hive/branches/branch-1.0@1655891 
13f79535-47bb-0310-9956-ffa450edef68
{code}
revoked.


Startup of vanilla hive-1.0.0 hive-server2 
{code}
STARTUP_MSG:   build = 
git://os2-debian80/net/os2-debian80/fs1/olaf/bigtop/output/hive/hive-1.0.0 -r 
813996292c9f966109f990127ddd5673cf813125; compiled by 'olaf' on Tue Apr 7 
09:33:01 CEST 2015
/
2015-04-07 10:23:52,579 INFO  [main]: server.HiveServer2 
(HiveServer2.java:startHiveServer2(292)) - Starting HiveServer2
2015-04-07 10:23:53,104 INFO  [main]: metastore.HiveMetaStore 
(HiveMetaStore.java:newRawStore(556)) - 0: Opening raw store with implemenation 
class:org.apache.hadoop.hive.metastore.ObjectStore
2015-04-07 10:23:53,135 INFO  [main]: metastore.ObjectStore 
(ObjectStore.java:initialize(264)) - ObjectStore, initialize called
2015-04-07 10:23:54,775 INFO  [main]: metastore.ObjectStore 
(ObjectStore.java:getPMF(345)) - Setting MetaStore object pin classes with 
hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Pa
rtition,Database,Type,FieldSchema,Order
2015-04-07 10:23:56,953 INFO  [main]: metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:init(132)) - Using direct SQL, underlying DB is DERBY
2015-04-07 10:23:56,954 INFO  [main]: metastore.ObjectStore 
(ObjectStore.java:setConf(247)) - Initialized ObjectStore
2015-04-07 10:23:57,275 INFO  [main]: metastore.HiveMetaStore 
(HiveMetaStore.java:createDefaultRoles_core(630)) - Added admin role in 
metastore
2015-04-07 10:23:57,276 INFO  [main]: metastore.HiveMetaStore 
(HiveMetaStore.java:createDefaultRoles_core(639)) - Added public role in 
metastore
2015-04-07 10:23:58,241 WARN  [main]: ipc.Client (Client.java:run(675)) - 
Exception encountered while connecting to the server : 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
2015-04-07 10:23:58,248 WARN  [main]: ipc.Client (Client.java:run(675)) - 
Exception encountered while connecting to the server : 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
2015-04-07 10:23:58,249 INFO  [main]: retry.RetryInvocationHandler 
(RetryInvocationHandler.java:invoke(140)) - Exception while invoking 
getFileInfo of class ClientNamenodeProtocolTranslatorPB over 
node2.proto.bsi.de/192.168.100.22:8020 after 1 fail over attempts. Trying to 
fail over immediately.
java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: node2.proto.bsi.de/192.168.100.22; 
destination host is: node2.proto.bsi.de:8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 

Re: Review Request 32809: Disallow create table with dot/colon in column name

2015-04-07 Thread Swarnim Kulkarni


 On April 7, 2015, 4:45 a.m., Swarnim Kulkarni wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 635
  https://reviews.apache.org/r/32809/diff/1/?file=914560#file914560line635
 
  This and the following line can be simplified as 
  
  return input.indexOf(c);
 
 pengcheng xiong wrote:
 The purpose of the function is to test whether a string contains a char. 
 The actual index is only used to check if it is there, the detailed position 
 information is not needed. That is to say, a boolean return value is enough 
 for my purpose.

Yup. I think your logic is correct. Just having a return input.indexOf(c); 
serves the same purpose but is simpler. :)


- Swarnim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32809/#review79121
---


On April 7, 2015, 6:18 p.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32809/
 ---
 
 (Updated April 7, 2015, 6:18 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Since we don't allow users to query column names with dot in the middle such 
 as emp.no, don't allow users to create tables with such columns that cannot 
 be queried. Fix the documentation to reflect this fix.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 2e583da 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/TestUnpermittedCharsInColumnNameCreateTableNegative.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/32809/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong
 




Re: Review Request 32370: HIVE-10040

2015-04-07 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32370/#review79235
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java
https://reviews.apache.org/r/32370/#comment128458

This could be in subclass; this will make HiveCostModel opaque to a 
specific exec engine's algorithms.


- John Pullokkaran


On April 7, 2015, 7:12 p.m., Jesús Camacho Rodríguez wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32370/
 ---
 
 (Updated April 7, 2015, 7:12 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Bugs: HIVE-10040
 https://issues.apache.org/jira/browse/HIVE-10040
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
 7adb38342bfaf72f152a16006bc0bfecbb28f5ed 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
  977313a5a632329fc963daf7ff276ccdd59ce7c5 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 
 41604cd0af68e7f90296fa271c42debc5aaf743a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
  9a8a5da81b92c7c1f33d1af8072b1fb94e237290 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java
  3e45a3fbed3265b126a3ff9b6ffe44bee24453ef 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java
  f411d9029cf244b66ef1d1591ea55f11f7cb9d27 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java
  5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java
  6c215c96190f0fcebe063b15c2763c49ebf1faaf 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java
  c8e9b52258eb209535a2bbfe512cf2d04178cb4b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java
  4984683c3c8c6c0378a22e21fd6d961f3901f25c 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java
  f846dd19899af51194f3407ef913fcb9bcc24977 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java
  207f402013b5c5b2d4ada5493122427ddce9270d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java
  95c2be50c07f0ed6da373425e60f1185cb2cfe2b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java
  dabbe280278dc80f00f0240a0c615fe6c7b8533a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
  95515b23e409d73d5c61e107931727add3f992a6 
 
 Diff: https://reviews.apache.org/r/32370/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jesús Camacho Rodríguez
 




Re: Review Request 32370: HIVE-10040

2015-04-07 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32370/#review79253
---

Ship it!


Ship It!

- John Pullokkaran


On April 7, 2015, 7:18 p.m., Jesús Camacho Rodríguez wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32370/
 ---
 
 (Updated April 7, 2015, 7:18 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Bugs: HIVE-10040
 https://issues.apache.org/jira/browse/HIVE-10040
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
 7adb38342bfaf72f152a16006bc0bfecbb28f5ed 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
  977313a5a632329fc963daf7ff276ccdd59ce7c5 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 
 41604cd0af68e7f90296fa271c42debc5aaf743a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
  9a8a5da81b92c7c1f33d1af8072b1fb94e237290 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java
  3e45a3fbed3265b126a3ff9b6ffe44bee24453ef 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java
  f411d9029cf244b66ef1d1591ea55f11f7cb9d27 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java
  5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java
  6c215c96190f0fcebe063b15c2763c49ebf1faaf 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java
  c8e9b52258eb209535a2bbfe512cf2d04178cb4b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java
  4984683c3c8c6c0378a22e21fd6d961f3901f25c 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java
  f846dd19899af51194f3407ef913fcb9bcc24977 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java
  207f402013b5c5b2d4ada5493122427ddce9270d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java
  95c2be50c07f0ed6da373425e60f1185cb2cfe2b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java
  dabbe280278dc80f00f0240a0c615fe6c7b8533a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
  95515b23e409d73d5c61e107931727add3f992a6 
 
 Diff: https://reviews.apache.org/r/32370/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jesús Camacho Rodríguez
 




Can anyone review HIVE-9864 Create UDF jsonpath which support full JsonPath syntax

2015-04-07 Thread Alexander Pivovarov
Hi Everyone

Can anyone review HIVE-9864 Create UDF jsonpath which support full JsonPath
syntax ?

It uses Jayway JsonPath 2.0.0 library to resolve JsonPath expressions
https://github.com/jayway/JsonPath

New UDF jsonpath supports full JsonPath syntax in comparison to old
get_json_object UDF which supports only limited JsonPath syntax.

https://issues.apache.org/jira/browse/HIVE-9864

https://reviews.apache.org/r/32387/diff/#

Thank you
Alex


[jira] [Created] (HIVE-10243) Introduce JoinAlgorithm Interface

2015-04-07 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-10243:
-

 Summary: Introduce JoinAlgorithm Interface
 Key: HIVE-10243
 URL: https://issues.apache.org/jira/browse/HIVE-10243
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32920: HIVE-10189: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization

2015-04-07 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32920/#review79241
---



itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java
https://reviews.apache.org/r/32920/#comment128472

I believe you meant to check for 'Double', right?



itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java
https://reviews.apache.org/r/32920/#comment128475

Did you mean to check for 'Long' here?


- Sergio Pena


On April 7, 2015, 6:06 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32920/
 ---
 
 (Updated April 7, 2015, 6:06 a.m.)
 
 
 Review request for hive and chengxiang li.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Add microbenchmark tool to show performance improvement by JMH
 
 
 Diffs
 -
 
   
 itests/hive-jmh/src/main/java/org/apache/hive/benchmark/vectorization/VectorizationBench.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/32920/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 cheng xu
 




Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

2015-04-07 Thread Jason Dere


 On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java,
   line 1347
  https://reviews.apache.org/r/32901/diff/1/?file=918312#file918312line1347
 
  Nit: Else not needed.

The new patch will look a bit different, so this will not be needed.


 On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote:
  ql/src/test/results/clientpositive/compute_stats_date.q.out, line 110
  https://reviews.apache.org/r/32901/diff/1/?file=918314#file918314line110
 
  Getting rid of the tabs here would be nice.

This is query output generated by Hive, the tabs are expected here as column 
output delimiters.


 On April 7, 2015, 4:19 a.m., Swarnim Kulkarni wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java,
   line 1341
  https://reviews.apache.org/r/32901/diff/1/?file=918312#file918312line1341
 
  Would it be a little safer here to assert that parameters has atleast 
  2 values in it so that we do not fail with an 
  ArrayIndexOutOfBoundsException?

GenericUDAFComputeStats.getEvaluator(), which instantiates the various 
StatsEvaluators, is already doing the initial checking of the array size. The 
other StatsEvaluators for the other types are relying on this as well.


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/#review79119
---


On April 6, 2015, 9:01 p.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32901/
 ---
 
 (Updated April 6, 2015, 9:01 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Prasanth_J.
 
 
 Bugs: HIVE-10226
 https://issues.apache.org/jira/browse/HIVE-10226
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Re-use the long stats for Date column stats, using the days since epoch value 
 as the long value.
 
 
 Diffs
 -
 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
 475883b 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
   
 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
  363039b 
   ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
   ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/32901/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jason Dere
 




Re: Review Request 32370: HIVE-10040

2015-04-07 Thread John Pullokkaran


 On April 7, 2015, 8:38 p.m., John Pullokkaran wrote:
  Ship It!

Remaining review comment will be addressed in HIVE-10243.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32370/#review79253
---


On April 7, 2015, 7:18 p.m., Jesús Camacho Rodríguez wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32370/
 ---
 
 (Updated April 7, 2015, 7:18 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Bugs: HIVE-10040
 https://issues.apache.org/jira/browse/HIVE-10040
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
 7adb38342bfaf72f152a16006bc0bfecbb28f5ed 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
  977313a5a632329fc963daf7ff276ccdd59ce7c5 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 
 41604cd0af68e7f90296fa271c42debc5aaf743a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
  9a8a5da81b92c7c1f33d1af8072b1fb94e237290 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java
  3e45a3fbed3265b126a3ff9b6ffe44bee24453ef 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java
  f411d9029cf244b66ef1d1591ea55f11f7cb9d27 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java
  5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java
  6c215c96190f0fcebe063b15c2763c49ebf1faaf 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java
  c8e9b52258eb209535a2bbfe512cf2d04178cb4b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java
  4984683c3c8c6c0378a22e21fd6d961f3901f25c 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java
  f846dd19899af51194f3407ef913fcb9bcc24977 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java
  207f402013b5c5b2d4ada5493122427ddce9270d 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java
  95c2be50c07f0ed6da373425e60f1185cb2cfe2b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java
  dabbe280278dc80f00f0240a0c615fe6c7b8533a 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
  95515b23e409d73d5c61e107931727add3f992a6 
 
 Diff: https://reviews.apache.org/r/32370/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jesús Camacho Rodríguez
 




Re: Review Request 32370: HIVE-10040

2015-04-07 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32370/
---

(Updated April 7, 2015, 7:18 p.m.)


Review request for hive and John Pullokkaran.


Changes
---

Added override annotation that was missing.


Bugs: HIVE-10040
https://issues.apache.org/jira/browse/HIVE-10040


Repository: hive-git


Description
---

CBO (Calcite Return Path): Pluggable cost modules [CBO branch]


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
7adb38342bfaf72f152a16006bc0bfecbb28f5ed 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
 977313a5a632329fc963daf7ff276ccdd59ce7c5 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCost.java 
41604cd0af68e7f90296fa271c42debc5aaf743a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveCostModel.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveDefaultCostModel.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveRelMdCost.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java
 9a8a5da81b92c7c1f33d1af8072b1fb94e237290 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java
 3e45a3fbed3265b126a3ff9b6ffe44bee24453ef 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java
 f411d9029cf244b66ef1d1591ea55f11f7cb9d27 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveLimit.java
 5fc64f3e8c97fc8988bc35be39dbabf78dd7de24 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveProject.java
 6c215c96190f0fcebe063b15c2763c49ebf1faaf 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java
 c8e9b52258eb209535a2bbfe512cf2d04178cb4b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCollation.java
 4984683c3c8c6c0378a22e21fd6d961f3901f25c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistribution.java
 f846dd19899af51194f3407ef913fcb9bcc24977 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdMemory.java
 207f402013b5c5b2d4ada5493122427ddce9270d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdParallelism.java
 95c2be50c07f0ed6da373425e60f1185cb2cfe2b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdRowCount.java
 dabbe280278dc80f00f0240a0c615fe6c7b8533a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
 95515b23e409d73d5c61e107931727add3f992a6 

Diff: https://reviews.apache.org/r/32370/diff/


Testing
---


Thanks,

Jesús Camacho Rodríguez



[jira] [Created] (HIVE-10246) [CBO] Table alias should be stored with Scan object, instead of Table object

2015-04-07 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-10246:
---

 Summary: [CBO] Table alias should be stored with Scan object, 
instead of Table object
 Key: HIVE-10246
 URL: https://issues.apache.org/jira/browse/HIVE-10246
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Diagnosability, Query Planning
Affects Versions: cbo-branch
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

2015-04-07 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/
---

(Updated April 7, 2015, 10:05 p.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Changes
---

Check for null values in Date/Decimal versions of 
MetaDataFormatUtils.convertToString()


Bugs: HIVE-10226
https://issues.apache.org/jira/browse/HIVE-10226


Repository: hive-git


Description
---

Re-use the long stats for Date column stats, using the days since epoch value 
as the long value.


Diffs (updated)
-

  metastore/if/hive_metastore.thrift 57bce0c 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 1666dc3 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
 bce9f0f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
 1662696 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 
363039b 
  ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
  ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/32901/diff/


Testing
---


Thanks,

Jason Dere



Review Request 32941: HIVE-10122

2015-04-07 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32941/
---

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

see JIRA


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
e34ce53 

Diff: https://reviews.apache.org/r/32941/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-10245) LLAP: Make use of the timed version of getDagStatus in TezJobMonitor

2015-04-07 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10245:
-

 Summary: LLAP: Make use of the timed version of getDagStatus in 
TezJobMonitor
 Key: HIVE-10245
 URL: https://issues.apache.org/jira/browse/HIVE-10245
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap


Version of HIVE-10157 for the LLAP branch since this already works with a 
branch based on tez 0.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 32901: HIVE-10226 Column stats for Date columns not supported

2015-04-07 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32901/
---

(Updated April 7, 2015, 9:48 p.m.)


Review request for hive, Ashutosh Chauhan and Prasanth_J.


Changes
---

Created new DateColumnStatsData in hive_metastore.thrift, which will be used 
for the date stats. Also updated describe formatter/UpdateStatsTask to handle 
Date column stats.
I've removed the generated Thrift code from diff to make this more readable.


Bugs: HIVE-10226
https://issues.apache.org/jira/browse/HIVE-10226


Repository: hive-git


Description
---

Re-use the long stats for Date column stats, using the days since epoch value 
as the long value.


Diffs (updated)
-

  metastore/if/hive_metastore.thrift 57bce0c 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
475883b 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 1666dc3 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
 bce9f0f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 0c46b00 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java b85282c 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
 1662696 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 
363039b 
  ql/src/test/queries/clientpositive/compute_stats_date.q PRE-CREATION 
  ql/src/test/results/clientpositive/compute_stats_date.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/32901/diff/


Testing
---


Thanks,

Jason Dere



[jira] [Created] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-04-07 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created HIVE-10244:
--

 Summary: Vectorization : TPC-DS Q80 fails with 
java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is 
enabled
 Key: HIVE-10244
 URL: https://issues.apache.org/jira/browse/HIVE-10244
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline


Query 
{code}
set hive.vectorized.execution.reduce.enabled=true;
with ssr as
 (select  s_store_id as store_id,
  sum(ss_ext_sales_price) as sales,
  sum(coalesce(sr_return_amt, 0)) as returns,
  sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit
  from store_sales left outer join store_returns on
 (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number),
 date_dim,
 store,
 item,
 promotion
 where ss_sold_date_sk = d_date_sk
   and d_date between cast('1998-08-04' as date) 
  and (cast('1998-09-04' as date))
   and ss_store_sk = s_store_sk
   and ss_item_sk = i_item_sk
   and i_current_price  50
   and ss_promo_sk = p_promo_sk
   and p_channel_tv = 'N'
 group by s_store_id)
 ,
 csr as
 (select  cp_catalog_page_id as catalog_page_id,
  sum(cs_ext_sales_price) as sales,
  sum(coalesce(cr_return_amount, 0)) as returns,
  sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
  from catalog_sales left outer join catalog_returns on
 (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
 date_dim,
 catalog_page,
 item,
 promotion
 where cs_sold_date_sk = d_date_sk
   and d_date between cast('1998-08-04' as date)
  and (cast('1998-09-04' as date))
and cs_catalog_page_sk = cp_catalog_page_sk
   and cs_item_sk = i_item_sk
   and i_current_price  50
   and cs_promo_sk = p_promo_sk
   and p_channel_tv = 'N'
group by cp_catalog_page_id)
 ,
 wsr as
 (select  web_site_id,
  sum(ws_ext_sales_price) as sales,
  sum(coalesce(wr_return_amt, 0)) as returns,
  sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit
  from web_sales left outer join web_returns on
 (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number),
 date_dim,
 web_site,
 item,
 promotion
 where ws_sold_date_sk = d_date_sk
   and d_date between cast('1998-08-04' as date)
  and (cast('1998-09-04' as date))
and ws_web_site_sk = web_site_sk
   and ws_item_sk = i_item_sk
   and i_current_price  50
   and ws_promo_sk = p_promo_sk
   and p_channel_tv = 'N'
group by web_site_id)
  select  channel
, id
, sum(sales) as sales
, sum(returns) as returns
, sum(profit) as profit
 from 
 (select 'store channel' as channel
, concat('store', store_id) as id
, sales
, returns
, profit
 from   ssr
 union all
 select 'catalog channel' as channel
, concat('catalog_page', catalog_page_id) as id
, sales
, returns
, profit
 from  csr
 union all
 select 'web channel' as channel
, concat('web_site', web_site_id) as id
, sales
, returns
, profit
 from   wsr
 ) x
 group by channel, id with rollup
 order by channel
 ,id
 limit 100
{code}

Exception 
{code}
Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, 
diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing vector batch (tag=0) 
\N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8
\N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7
\N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 

[jira] [Created] (HIVE-10247) [Refactor] Move Noop TableFunctionEvaluator to contrib/ module

2015-04-07 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-10247:
---

 Summary: [Refactor] Move Noop TableFunctionEvaluator to contrib/ 
module
 Key: HIVE-10247
 URL: https://issues.apache.org/jira/browse/HIVE-10247
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Ashutosh Chauhan


see comments from [HIVE-9073 
|https://issues.apache.org/jira/browse/HIVE-9073?focusedCommentId=14481894page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14481894]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-04-07 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31178/
---

(Updated April 8, 2015, 12:40 a.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Address test failures.


Repository: hive-git


Description
---

The discrepancy is because NDV calculation for a partitioned table assumes that 
the NDV range is contained within each partition and is calculates as select 
max(NUM_DISTINCTS) from PART_COL_STATS” .
This is problematic for columns like ticket number which are naturally 
increasing with the partitioned date column ss_sold_date_sk.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2b8280e 
  data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION 
  
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 
74f1b01 
  
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java
 7fc04f1 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
ba27f10 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 75005aa 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
475883b 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/31178/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Created] (HIVE-10248) LLAP: Fix merge conflicts related to HIVE-10067

2015-04-07 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-10248:


 Summary: LLAP: Fix merge conflicts related to HIVE-10067
 Key: HIVE-10248
 URL: https://issues.apache.org/jira/browse/HIVE-10248
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Some changes were lost in the recent trunk to llap merge related to HIVE-10067



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10249) ACID: show locks should show who the lock is waiting for

2015-04-07 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-10249:
-

 Summary: ACID: show locks should show who the lock is waiting for
 Key: HIVE-10249
 URL: https://issues.apache.org/jira/browse/HIVE-10249
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


instead of just showing state WAITING, we should include what the lock is 
waiting for.  It will make diagnostics easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10250) Optimize AuthorizationPreEventListener to reuse TableWrapper objects

2015-04-07 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-10250:
---

 Summary: Optimize AuthorizationPreEventListener to reuse 
TableWrapper objects
 Key: HIVE-10250
 URL: https://issues.apache.org/jira/browse/HIVE-10250
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Mithun Radhakrishnan


Here's the {{PartitionWrapper}} class in {{AuthorizationPreEventListener}}:
{code:java|title=AuthorizationPreEventListener.java}
 public static class PartitionWrapper extends 
org.apache.hadoop.hive.ql.metadata.Partition {
...
public PartitionWrapper(org.apache.hadoop.hive.metastore.api.Partition 
mapiPart, PreEventContext context) throws ... {
 Partition wrapperApiPart   = mapiPart.deepCopy();
 Table t = context.getHandler().get_table_core(
 mapiPart.getDbName(), 
 mapiPart.getTableName());
...
}
{code}

{{PreAddPartitionEvent}} (and soon, {{PreDropPartitionEvent}}) correspond not 
just to a single partition, but an entire set of partitions added atomically. 
When the event is authorized, {{HMSHandler.get_table_core()}} will be called 
once for every partition in the Event instance.

Since we already make the assumption that the partition-sets correspond to a 
single table, we might as well make a single call.

I'll have a patch for this, shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10252) Make PPD work for Parquet in row group level

2015-04-07 Thread Dong Chen (JIRA)
Dong Chen created HIVE-10252:


 Summary: Make PPD work for Parquet in row group level
 Key: HIVE-10252
 URL: https://issues.apache.org/jira/browse/HIVE-10252
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen


In Hive, predicate pushdown figures out the search condition in HQL, serialize 
it, and push to file format. ORC could use the predicate to filter stripes. 
Similarly, Parquet should use the statics saved in row group to filter not 
match row group. But it does not work.

In {{ParquetRecordReaderWrapper}}, it get splits with all row groups (client 
side), and push the filter to Parquet for further processing (parquet side). 
But in  {{ParquetRecordReader.initializeInternalReader()}}, if the splits have 
already been selected by client side, it will not handle filter again.

We should make the behavior consistent in Hive. Maybe we could get splits, 
filter them, and then pass to parquet. This means using client side strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10254) Parquet PPD support DECIMAL

2015-04-07 Thread Dong Chen (JIRA)
Dong Chen created HIVE-10254:


 Summary: Parquet PPD support DECIMAL
 Key: HIVE-10254
 URL: https://issues.apache.org/jira/browse/HIVE-10254
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10253) Parquet PPD support DATE

2015-04-07 Thread Dong Chen (JIRA)
Dong Chen created HIVE-10253:


 Summary: Parquet PPD support DATE
 Key: HIVE-10253
 URL: https://issues.apache.org/jira/browse/HIVE-10253
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen


Hive should handle the DATE data type when generating and pushing the predicate 
to Parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10255) Parquet PPD support TIMESTAMP

2015-04-07 Thread Dong Chen (JIRA)
Dong Chen created HIVE-10255:


 Summary: Parquet PPD support TIMESTAMP
 Key: HIVE-10255
 URL: https://issues.apache.org/jira/browse/HIVE-10255
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10256) Eliminate row groups based on the block statistics in Parquet

2015-04-07 Thread Dong Chen (JIRA)
Dong Chen created HIVE-10256:


 Summary: Eliminate row groups based on the block statistics in 
Parquet
 Key: HIVE-10256
 URL: https://issues.apache.org/jira/browse/HIVE-10256
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen


In Parquet PPD, the not matched row groups should be eliminated. See 
{{TestOrcSplitElimination}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10257) Ensure Parquet Hive has null optimization

2015-04-07 Thread Dong Chen (JIRA)
Dong Chen created HIVE-10257:


 Summary: Ensure Parquet Hive has null optimization
 Key: HIVE-10257
 URL: https://issues.apache.org/jira/browse/HIVE-10257
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen


In Parquet statistics, a boolean value {{hasNonNullValue}} is used for each 
column chunk. Hive could use this value to skip a column, avoid null-checking 
logic, and speed up vectorization like HIVE-4478 (in the future, it is not 
completed yet).

In this Jira we could check whether this null optimization works, and make 
changes if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10258) LLAP: orc_llap test fails again

2015-04-07 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10258:
---

 Summary: LLAP: orc_llap test fails again
 Key: HIVE-10258
 URL: https://issues.apache.org/jira/browse/HIVE-10258
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Prasanth Jayachandran


{noformat}
Caused by: java.io.IOException: java.io.IOException: java.io.IOException: 
Corruption in ORC data encountered. To skip reading corrupted data, set 
hive.exec.orc.skip.corrupt.data to true{noformat}

llap_partitioned passes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10251) HIVE-9664 makes hive depend on ivysettings.xml

2015-04-07 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10251:
---

 Summary: HIVE-9664 makes hive depend on ivysettings.xml
 Key: HIVE-10251
 URL: https://issues.apache.org/jira/browse/HIVE-10251
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan


HIVE-9664 makes hive depend on the existence of ivysettings.xml, and if it is 
not present, it makes hive NPE when instantiating a CLISessionState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: ORC separate project

2015-04-07 Thread Xuefu Zhang
If I understood Allen's #2 comment, we are moving existing ORC code out of
Hive and make it a separate project, which I definitely missed. Since
existing Hive PMC has governance on the code, I would expect it's still the
case even after the spinoff. Obviously the proposal doesn't reflect this.

Thanks,
Xuefu

On Fri, Apr 3, 2015 at 12:51 PM, Alan Gates alanfga...@gmail.com wrote:

 A couple of points:

 1) ORC isn't going into the incubator.  The proposal before the board is
 for it to go straight to TLP.  There's no graduation to depend on.
 2) As currently proposed Hive would not depend on ORC to build.  Hive
 users who wished to used ORC would obviously need to pull in ORC artifacts
 in addition to Hive.  Given this I don't think it makes any sense to fork
 ORC and have it in both places.  This actually seems the worse outcome, as
 the two will inevitably diverge.

 Alan.

   Xuefu Zhang xzh...@cloudera.com
  April 3, 2015 at 6:41
 I actually have a different thought to share along the same line.

 ORC is not a subproject in Hive. I'm not sure if it's the best we can do by
 making a surgery on Hive in order to make ORC a TLP, Not only may this
 bring instability to Hive, but also it also makes Hive depend an incubating
 project. Not every project graduates(, though I do wish ORC a success as
 TLP), some of them fail.

 Instead, I like the idea of forking Hive ORC as TLP and Hive keeps whatever
 it has. This way, the new project can do whatever it wants, and Hive
 community probably doesn't care and has no saying to it. Once ORC as a TLP
 graduates, Hive community can decide whether to go along with it and if so
 how to integrate with it.

 I think this will subside the current controversy, help ORC proceed faster
 as a TLP, and leave the decision to the near future.

 Thanks,
 Xuefu


   Szehon Ho sze...@cloudera.com
  April 2, 2015 at 23:54
 I also agree with this goal.

 As such, I think we should first see the proposal (JIRA?) for the
 storage-api refactoring and other related work of Orc separating as TLP
 before the actual separation happens, to make sure the separation is not
 done in a way taking us further from this goal. It may very well be this
 refactoring moves us closer to the goal, but seeing the proposal first
 would give a lot of clarity.

 Thanks
 Szehon

 On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo edlinuxg...@gmail.com
 edlinuxg...@gmail.com

   Edward Capriolo edlinuxg...@gmail.com
  April 2, 2015 at 22:20
 To reiterate, one thing I want to avoid is having hive rely on code that
 sits in several tiny silos across Apache projects, or Apache Licensed but
 not ASF projects. Hive is a mature TLP with a large number of committers
 and it would not be a good situation if often work gets bottle necked
 because changes had to be made across two projects simultaneously to commit
 a feature. Especially if the two projects do not share the same committer
 list.

 I think if could be done perfectly things like ORC, Parquet, whatever would
 be provided scope dependencies, meaning the project can be built without
 a particular piece but as a hole the project still works. (That might be
 easier said than done :)


   Nick Dimiduk ndimi...@gmail.com
  April 1, 2015 at 11:51
 I think the storage-api would be very helpful for HBase integration as
 well.


   Owen O'Malley omal...@apache.org
  April 1, 2015 at 11:22




 What I'd like to see here is well defined interfaces in Hive so that any
 storage format that wants can implement them.  Hopefully that means things
 like interfaces and utility classes for acid, sargs, and vectorization move
 into this new Hive module storage-api.  Then Orc, Parquet, etc. can depend
 on this module without needing to pull in all of Hive.

 Then Hive contributors would only be forced to make changes in Orc when
 they want to implement something in Orc.


 Agreed. The goal of the new module keep a clean separation between the
 code for ORC and Hive so that vectorization, sargs, and acid are kept in
 Hive and are not moved to or duplicated in the ORC project.

 .. Owen




Re: ORC separate project

2015-04-07 Thread Lefty Leverenz
Actually not so -- a spin-off project would have its own PMC and the Hive
PMC wouldn't have any say-so.  Of course, there would be some overlap of
the two PMCs.

(I'm not even sure if the PMC has governance of code, technically.  That
might belong to the committers or the development community.  Well, the PMC
does vote on release candidates so that's a kind of goverance.  But the
community is supposed to decide on major issues.)

Anyway under the Apache license, nobody needs permission from the PMC to
grab some code and use it for another purpose.


-- Lefty

On Tue, Apr 7, 2015 at 11:49 PM, Xuefu Zhang xzh...@cloudera.com wrote:

 If I understood Allen's #2 comment, we are moving existing ORC code out of
 Hive and make it a separate project, which I definitely missed. Since
 existing Hive PMC has governance on the code, I would expect it's still the
 case even after the spinoff. Obviously the proposal doesn't reflect this.

 Thanks,
 Xuefu

 On Fri, Apr 3, 2015 at 12:51 PM, Alan Gates alanfga...@gmail.com wrote:

 A couple of points:

 1) ORC isn't going into the incubator.  The proposal before the board is
 for it to go straight to TLP.  There's no graduation to depend on.
 2) As currently proposed Hive would not depend on ORC to build.  Hive
 users who wished to used ORC would obviously need to pull in ORC artifacts
 in addition to Hive.  Given this I don't think it makes any sense to fork
 ORC and have it in both places.  This actually seems the worse outcome, as
 the two will inevitably diverge.

 Alan.

   Xuefu Zhang xzh...@cloudera.com
  April 3, 2015 at 6:41
 I actually have a different thought to share along the same line.

 ORC is not a subproject in Hive. I'm not sure if it's the best we can do
 by
 making a surgery on Hive in order to make ORC a TLP, Not only may this
 bring instability to Hive, but also it also makes Hive depend an
 incubating
 project. Not every project graduates(, though I do wish ORC a success as
 TLP), some of them fail.

 Instead, I like the idea of forking Hive ORC as TLP and Hive keeps
 whatever
 it has. This way, the new project can do whatever it wants, and Hive
 community probably doesn't care and has no saying to it. Once ORC as a TLP
 graduates, Hive community can decide whether to go along with it and if so
 how to integrate with it.

 I think this will subside the current controversy, help ORC proceed faster
 as a TLP, and leave the decision to the near future.

 Thanks,
 Xuefu


   Szehon Ho sze...@cloudera.com
  April 2, 2015 at 23:54
 I also agree with this goal.

 As such, I think we should first see the proposal (JIRA?) for the
 storage-api refactoring and other related work of Orc separating as TLP
 before the actual separation happens, to make sure the separation is not
 done in a way taking us further from this goal. It may very well be this
 refactoring moves us closer to the goal, but seeing the proposal first
 would give a lot of clarity.

 Thanks
 Szehon

 On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo edlinuxg...@gmail.com
 edlinuxg...@gmail.com

   Edward Capriolo edlinuxg...@gmail.com
  April 2, 2015 at 22:20
 To reiterate, one thing I want to avoid is having hive rely on code that
 sits in several tiny silos across Apache projects, or Apache Licensed but
 not ASF projects. Hive is a mature TLP with a large number of committers
 and it would not be a good situation if often work gets bottle necked
 because changes had to be made across two projects simultaneously to
 commit
 a feature. Especially if the two projects do not share the same committer
 list.

 I think if could be done perfectly things like ORC, Parquet, whatever
 would
 be provided scope dependencies, meaning the project can be built without
 a particular piece but as a hole the project still works. (That might be
 easier said than done :)


   Nick Dimiduk ndimi...@gmail.com
  April 1, 2015 at 11:51
 I think the storage-api would be very helpful for HBase integration as
 well.


   Owen O'Malley omal...@apache.org
  April 1, 2015 at 11:22




 What I'd like to see here is well defined interfaces in Hive so that any
 storage format that wants can implement them.  Hopefully that means things
 like interfaces and utility classes for acid, sargs, and vectorization move
 into this new Hive module storage-api.  Then Orc, Parquet, etc. can depend
 on this module without needing to pull in all of Hive.

 Then Hive contributors would only be forced to make changes in Orc when
 they want to implement something in Orc.


 Agreed. The goal of the new module keep a clean separation between the
 code for ORC and Hive so that vectorization, sargs, and acid are kept in
 Hive and are not moved to or duplicated in the ORC project.

 .. Owen