Review Request 35576: HIVE-11028 Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-17 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35576/
---

Review request for hive, Gunther Hagleitner and John Pullokkaran.


Bugs: HIVE-11028
https://issues.apache.org/jira/browse/HIVE-11028


Repository: hive-git


Description
---

Change TezCompiler to only run short-cutting of expressions rather than full 
constant folding.


Diffs
-

  itests/src/test/resources/testconfiguration.properties b9f39fb 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java 
0027960 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcCtx.java 
6bb2a09 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
 4a4814d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 56707af 
  ql/src/test/queries/clientpositive/tez_self_join.q PRE-CREATION 
  ql/src/test/results/clientpositive/tez/tez_self_join.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/35576/diff/


Testing
---

qfile test added


Thanks,

Jason Dere



Re: Review Request 35576: HIVE-11028 Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-17 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35576/#review88269
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java (line 46)
https://reviews.apache.org/r/35576/#comment140696

As we discussed, Couldn't we move foldExpr from ConstantPropagate to 
ExprNodeDescUtils.

Then DPP could use ExprNodeDescUtils directly without depending on 
ConstantProp.

IMO, its a better seperation.


- John Pullokkaran


On June 17, 2015, 6:38 p.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35576/
 ---
 
 (Updated June 17, 2015, 6:38 p.m.)
 
 
 Review request for hive, Gunther Hagleitner and John Pullokkaran.
 
 
 Bugs: HIVE-11028
 https://issues.apache.org/jira/browse/HIVE-11028
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Change TezCompiler to only run short-cutting of expressions rather than full 
 constant folding.
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties b9f39fb 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java 
 0027960 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcCtx.java 
 6bb2a09 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
  4a4814d 
   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 56707af 
   ql/src/test/queries/clientpositive/tez_self_join.q PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_self_join.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/35576/diff/
 
 
 Testing
 ---
 
 qfile test added
 
 
 Thanks,
 
 Jason Dere
 




Re: PL/HQL and Hive

2015-06-17 Thread Dmitry Tolpeko
Alan,

HPL/SQL is a good name, I am ok with this change. Right now I am the only
one developer of PL/HQL. Which status will I be given in the Hive project,
so I can continue developing the tool? I will read docs and try to create a
patch.

Thanks,

Dmitry

On Wed, Jun 17, 2015 at 9:55 PM, Alan Gates alanfga...@gmail.com wrote:

 Here's what we need to do:

 1) You need to file a JIRA proposing to contribute the code.
 2) You can then contribute the code as a patch to that JIRA.  As long as
 you've written all the code yourself this is sufficient to hand legal
 rights to Apache to contribute the code.  If others beyond you have legal
 claim to the code (ie they wrote it or paid you to write it) we'll need to
 work with Apache and those authors to get clearance to include the code.
 3) Before committing the code we need to move it to an org.apache.hive
 packaging structure.  I propose that we put it in a new package
 org.apache.hive.hplsql (see below for why I chose that).  We can take the
 patch you submit and make this change before committing or you can move it
 yourself before you contribute the patch.
 4) One of the current committers can then take the patch and get it
 committed.

 One suggestion that might be controversial:  I propose we change the name
 from PL/HQL to HPL/SQL (hence my packaging name suggestion above).  We want
 to move away from saying Hive has a language called HQL which is SQL like.
 At this point Hive's SQL is most of the way to SQL-92 so talking about HQL
 just confuses people.  Hence Hive PL/SQL (HPL/SQL) seems better.  Or if you
 prefer we could do PL/HSQL.

 Alan.

   Dmitry Tolpeko dmtolp...@gmail.com
  June 15, 2015 at 8:03
 Hi Alan,

 I am back from my vacation. Please let me know what actions, information
 is required for me regarding IP. Can we talk about Jira creation and first
 steps to make PL/HQL conform to Hive standards?

 Thanks,

 Dmitry







Re: Review Request 34897: CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD

2015-06-17 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34897/
---

(Updated June 17, 2015, 5:54 p.m.)


Review request for hive and Jesús Camacho Rodríguez.


Changes
---

Address all the comments.


Repository: hive-git


Description
---

in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, when 
aliases contains empty string  and key is an empty string  too, it assumes 
that aliases contains key. This will trigger incorrect PPD. To reproduce it, 
apply the HIVE-10455 and run cbo_subq_notin.q.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java
 9c21238 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java
 e7c8342 

Diff: https://reviews.apache.org/r/34897/diff/


Testing
---


Thanks,

pengcheng xiong



Re: PL/HQL and Hive

2015-06-17 Thread Alan Gates

Here's what we need to do:

1) You need to file a JIRA proposing to contribute the code.
2) You can then contribute the code as a patch to that JIRA.  As long as 
you've written all the code yourself this is sufficient to hand legal 
rights to Apache to contribute the code.  If others beyond you have 
legal claim to the code (ie they wrote it or paid you to write it) we'll 
need to work with Apache and those authors to get clearance to include 
the code.
3) Before committing the code we need to move it to an org.apache.hive 
packaging structure.  I propose that we put it in a new package 
org.apache.hive.hplsql (see below for why I chose that).  We can take 
the patch you submit and make this change before committing or you can 
move it yourself before you contribute the patch.
4) One of the current committers can then take the patch and get it 
committed.


One suggestion that might be controversial:  I propose we change the 
name from PL/HQL to HPL/SQL (hence my packaging name suggestion above).  
We want to move away from saying Hive has a language called HQL which is 
SQL like.  At this point Hive's SQL is most of the way to SQL-92 so 
talking about HQL just confuses people.  Hence Hive PL/SQL (HPL/SQL) 
seems better.  Or if you prefer we could do PL/HSQL.


Alan.


Dmitry Tolpeko mailto:dmtolp...@gmail.com
June 15, 2015 at 8:03
Hi Alan,

I am back from my vacation. Please let me know what actions, 
information is required for me regarding IP. Can we talk about Jira 
creation and first steps to make PL/HQL conform to Hive standards?


Thanks,

Dmitry






Re: Review Request 35532: HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly

2015-06-17 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35532/
---

(Updated June 17, 2015, 10:47 p.m.)


Review request for hive.


Repository: hive-git


Description
---

HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the 
value against NULL value incorrectly


Diffs (updated)
-

  data/files/emp2.txt 650aff7 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 
32471f2 
  ql/src/test/queries/clientpositive/windowing_windowspec3.q 608a6cf 
  ql/src/test/results/clientpositive/windowing_windowspec3.q.out 42c042f 

Diff: https://reviews.apache.org/r/35532/diff/


Testing
---


Thanks,

Aihua Xu



Re: PL/HQL and Hive

2015-06-17 Thread Alan Gates
In Apache projects there are contributors and committers.  Contributors 
are anyone who helps with the project via code, docs, tests, bug 
reports, etc.  Committers can commit code, though it must still be 
reviewed by other committers.  On the process of becoming a committer in 
Hive see 
https://cwiki.apache.org/confluence/display/Hive/BecomingACommitter  
Obviously contributing a large bit of functionality starts you on that 
road nicely.


If you need help getting the patch together let me know.

Alan.


Dmitry Tolpeko mailto:dmtolp...@gmail.com
June 17, 2015 at 13:02
Alan,

HPL/SQL is a good name, I am ok with this change. Right now I am the 
only one developer of PL/HQL. Which status will I be given in the Hive 
project, so I can continue developing the tool? I will read docs and 
try to create a patch.


Thanks,

Dmitry


Alan Gates mailto:alanfga...@gmail.com
June 17, 2015 at 11:55
Here's what we need to do:

1) You need to file a JIRA proposing to contribute the code.
2) You can then contribute the code as a patch to that JIRA.  As long 
as you've written all the code yourself this is sufficient to hand 
legal rights to Apache to contribute the code.  If others beyond you 
have legal claim to the code (ie they wrote it or paid you to write 
it) we'll need to work with Apache and those authors to get clearance 
to include the code.
3) Before committing the code we need to move it to an org.apache.hive 
packaging structure.  I propose that we put it in a new package 
org.apache.hive.hplsql (see below for why I chose that).  We can take 
the patch you submit and make this change before committing or you can 
move it yourself before you contribute the patch.
4) One of the current committers can then take the patch and get it 
committed.


One suggestion that might be controversial:  I propose we change the 
name from PL/HQL to HPL/SQL (hence my packaging name suggestion 
above).  We want to move away from saying Hive has a language called 
HQL which is SQL like.  At this point Hive's SQL is most of the way to 
SQL-92 so talking about HQL just confuses people.  Hence Hive PL/SQL 
(HPL/SQL) seems better.  Or if you prefer we could do PL/HSQL.


Alan.

Dmitry Tolpeko mailto:dmtolp...@gmail.com
June 15, 2015 at 8:03
Hi Alan,

I am back from my vacation. Please let me know what actions, 
information is required for me regarding IP. Can we talk about Jira 
creation and first steps to make PL/HQL conform to Hive standards?


Thanks,

Dmitry


Dmitry Tolpeko mailto:dmtolp...@gmail.com
June 2, 2015 at 12:35
Alan,

I am new to the Hive project structure and development process, so I 
would highly appreciate your guidance (if you can initiate Jira or 
tell me how to do that i.e). Also I can grant software to Apache if 
required although I am not sure which IP clearance required. For me 
uploading of the code is sufficient.


Thank you,

Dmitry


Alan Gates mailto:alanfga...@gmail.com
June 1, 2015 at 15:50
Dmitry,

I'm thrilled to hear that you're open to integrating PL/HQL into Hive.

As for how we'd do it, this is obviously something we'll have to 
discuss in the community on the dev list.  But my initial thought is 
that we start by importing it as it, mostly focussing on package name 
changes, etc.  So it starts as a stand alone.  Then over time we work 
on integrating it directly into Hive.  This will have a number of 
benefits for users as they'll be able to create and store procedures, 
invoke them from JDBC connections, grant and revoke access to 
procedures, etc.


So I think the next step is to open a JIRA on it and then we can start 
building a patch to contribute the code.


Given that PL/HQL has already been released as a separate entity I'm 
not sure if we need additional IP clearance (ie you have to sign a 
grant) or if you uploading the code to a JIRA is sufficient.  Do any 
of the Hive PMC know?


No worries if you can't respond until June 12, there's no a rush.  
Enjoy your vacation.


Alan.





[jira] [Created] (HIVE-11039) Write a tool to allow people with datanucelus.identifierFactory=datanucleus2 to migrate their metastore to datanucleus1 naming

2015-06-17 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11039:
---

 Summary: Write a tool to allow people with 
datanucelus.identifierFactory=datanucleus2 to migrate their metastore to 
datanucleus1 naming
 Key: HIVE-11039
 URL: https://issues.apache.org/jira/browse/HIVE-11039
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical


We hit an interesting bug in a case where datanucleus.identifierFactory = 
datanucleus2 .

The problem is that directSql handgenerates SQL strings assuming datanucleus1 
naming scheme. If a user has their metastore JDO managed by 
datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are 
incorrect.

One simple example of what this results in is the following: whenever DN 
persists a field which is held as a ListT, it winds up storing each T as a 
separate line in the appropriate mapping table, and has a column called 
INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
results in the list retaining its order. In DN2 naming scheme, the column is 
called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
and IDX.

Whenever they use JDO, such as with all writes, it will then use the IDX field, 
and when they do any sort of optimized reads, such as through directSQL, it 
will ORDER BY INTEGER_IDX.

An immediate danger is seen when we consider that the schema of a table is 
stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
schema for the table can come up mixed up in the table's native hashing order, 
rather than sorted by the index.

This can then result in schema ordering being different from the actual table. 
For eg:, if a user has a (a:int,b:string,c:string), a describe on this may 
return (c:string, a:int, b: string), and thus, queries which are inserting 
after selecting from another table can have ClassCastExceptions when trying to 
insert data in the wong order - this is how we discovered this bug. This 
problem, however, can be far worse, if there are no type problems - it is 
possible, for eg., that if a,bc were all strings, that that insert query would 
succeed but mix up the order, which then results in user table data being mixed 
up. This has the potential to be very bad.

We should write a tool to help convert metastores that use datanucleus2 to 
datanucleus1(more difficult, needs more one-time testing) or change directSql 
to support both(easier to code, but increases test-coverage matrix 
significantly and we should really then be testing against both schemes). But 
in the short term, we should disable directSql if we see that the 
identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35576: HIVE-11028 Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-17 Thread Jason Dere


 On June 17, 2015, 7:39 p.m., John Pullokkaran wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java, 
  line 46
  https://reviews.apache.org/r/35576/diff/1/?file=986551#file986551line46
 
  As we discussed, Couldn't we move foldExpr from ConstantPropagate to 
  ExprNodeDescUtils.
  
  Then DPP could use ExprNodeDescUtils directly without depending on 
  ConstantProp.
  
  IMO, its a better seperation.

It looks like ConstantPropagate does additional things that we actually want to 
be done, such as being able to delete filter operators if the filter expression 
consists of a single constant True expression. We end up with those kinds of 
situationss from the dynamic partition pruning. I believe it would be easier 
and a more generic reusable solution to use ConstantPropagate here, than to 
have to re-implement similar logic in DynamicPartitionPrunerProc.


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35576/#review88269
---


On June 17, 2015, 6:38 p.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35576/
 ---
 
 (Updated June 17, 2015, 6:38 p.m.)
 
 
 Review request for hive, Gunther Hagleitner and John Pullokkaran.
 
 
 Bugs: HIVE-11028
 https://issues.apache.org/jira/browse/HIVE-11028
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Change TezCompiler to only run short-cutting of expressions rather than full 
 constant folding.
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties b9f39fb 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java 
 0027960 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcCtx.java 
 6bb2a09 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
  4a4814d 
   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 56707af 
   ql/src/test/queries/clientpositive/tez_self_join.q PRE-CREATION 
   ql/src/test/results/clientpositive/tez/tez_self_join.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/35576/diff/
 
 
 Testing
 ---
 
 qfile test added
 
 
 Thanks,
 
 Jason Dere
 




Re: Review Request 35532: HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly

2015-06-17 Thread Ashutosh Chauhan


 On June 17, 2015, 4:08 p.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java, 
  line 1240
  https://reviews.apache.org/r/35532/diff/1/?file=985909#file985909line1240
 
  This doesn't seem right. isGreater() (as oppose to isEqual()) is not 
  symmetric w.r.t order of two arguments. e.g., consider v1 = 23 and v2 = 
  NULL, this call will return v1  v2. However, if v1 = NULL and v2 = 23, it 
  will still return v1  v2. Either NULLs should always be greater or always 
  be smaller, otherwise this has potential to generate incorrect result set.
 
 Aihua Xu wrote:
 The name isGreater() probably is a little misleading. It actually means 
 if the distance of second value to the first value is greater than the given 
 amt. When v1 = 23 and v2 = null, the distance is considered greater than 10 
 (actually any value) since 23 and null are not comparable; the same for v1 = 
 null and v2 = 23. v1 = null and v2 = null are considered less than 10 since 
 they are both null and the distance is 0.
 
 I can change the name to mean what it means, like isDistanceGreater(), so 
 that we won't be confused with isEqual() (which is what I initially did as 
 well); or add some comments. Suggestions?

I see. Yeah, change the name and add comments too.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35532/#review88235
---


On June 16, 2015, 8:13 p.m., Aihua Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35532/
 ---
 
 (Updated June 16, 2015, 8:13 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing 
 the value against NULL value incorrectly
 
 
 Diffs
 -
 
   data/files/emp2.txt 650aff7f2c8003fb7c04dfa377c2b25d04f3ce88 
   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 
 32471f2dc864c38a2969909efa5b21508e27d7f8 
   ql/src/test/queries/clientpositive/windowing_windowspec3.q 
 608a6cf45e3c1e0b928800dae0470e8acfd77734 
   ql/src/test/results/clientpositive/windowing_windowspec3.q.out 
 42c042f2cf80f0a5a8269ad9eb9864d7e76525cc 
 
 Diff: https://reviews.apache.org/r/35532/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aihua Xu
 




Re: Review Request 35532: HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly

2015-06-17 Thread Aihua Xu


 On June 17, 2015, 4:08 p.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java, 
  line 1240
  https://reviews.apache.org/r/35532/diff/1/?file=985909#file985909line1240
 
  This doesn't seem right. isGreater() (as oppose to isEqual()) is not 
  symmetric w.r.t order of two arguments. e.g., consider v1 = 23 and v2 = 
  NULL, this call will return v1  v2. However, if v1 = NULL and v2 = 23, it 
  will still return v1  v2. Either NULLs should always be greater or always 
  be smaller, otherwise this has potential to generate incorrect result set.

The name isGreater() probably is a little misleading. It actually means if the 
distance of second value to the first value is greater than the given amt. When 
v1 = 23 and v2 = null, the distance is considered greater than 10 (actually any 
value) since 23 and null are not comparable; the same for v1 = null and v2 = 
23. v1 = null and v2 = null are considered less than 10 since they are both 
null and the distance is 0.

I can change the name to mean what it means, like isDistanceGreater(), so that 
we won't be confused with isEqual() (which is what I initially did as well); or 
add some comments. Suggestions?


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35532/#review88235
---


On June 16, 2015, 8:13 p.m., Aihua Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35532/
 ---
 
 (Updated June 16, 2015, 8:13 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing 
 the value against NULL value incorrectly
 
 
 Diffs
 -
 
   data/files/emp2.txt 650aff7f2c8003fb7c04dfa377c2b25d04f3ce88 
   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 
 32471f2dc864c38a2969909efa5b21508e27d7f8 
   ql/src/test/queries/clientpositive/windowing_windowspec3.q 
 608a6cf45e3c1e0b928800dae0470e8acfd77734 
   ql/src/test/results/clientpositive/windowing_windowspec3.q.out 
 42c042f2cf80f0a5a8269ad9eb9864d7e76525cc 
 
 Diff: https://reviews.apache.org/r/35532/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aihua Xu
 




[jira] [Created] (HIVE-11038) MiniTezCli tests are hanging

2015-06-17 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-11038:


 Summary: MiniTezCli tests are hanging
 Key: HIVE-11038
 URL: https://issues.apache.org/jira/browse/HIVE-11038
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 2.0.0
Reporter: Wei Zheng
Priority: Blocker


Whenever running a MiniTezCli test, it just hangs.

Here's the maven command to run a test:
{code}
$ mvn test -Phadoop-2 -Dtest=TestMiniTezCliDriver 
-Dqfile=dynamic_partition_pruning.q
{code}
Here's the tail of org.apache.hadoop.hive.cli.TestMiniTezCliDriver-output.txt:
{code}
Status: Running (Executing on YARN cluster with App id 
application_1434574617753_0001)

Map 1: -/-  Reducer 2: 0/1
Map 1: 1/1  Reducer 2: 1/1
POSTHOOK: query: analyze table lineitem compute statistics for columns
POSTHOOK: type: QUERY
POSTHOOK: Input: default@lineitem
POSTHOOK: Output: 
file:/Users/wzheng/bf/hive/itests/qtest/target/tmp/localscratchdir/c684ea6a-11b1-4253-a529-c3778695b72a/hive_2015-06-17_13-57-19_047_1275844087077606719-1/-mr-1
OK
Time taken: 0.387 seconds
Begin query: dynamic_partition_pruning.q
ivysettings.xml file not found in HIVE_HOME or 
HIVE_CONF_DIR,/Users/wzheng/bf/hive/conf/ivysettings.xml will be used
{code}
And here's the jstack output (partial):
{code}
main #1 prio=5 os_prio=31 tid=0x7fc75e805800 nid=0x1303 waiting on 
condition [0x000101d84000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.monitorExecution(TezJobMonitor.java:378)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:168)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1657)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1416)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1197)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1033)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1007)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.runTest(TestMiniTezCliDriver.java:146)
at 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning(TestMiniTezCliDriver.java:130)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

VM Thread os_prio=31 tid=0x7fc75e830800 nid=0x3103 runnable

GC task thread#0 (ParallelGC) os_prio=31 tid=0x7fc75e811800 nid=0x2103 
runnable

GC task thread#1 (ParallelGC) os_prio=31 tid=0x7fc75f00 nid=0x2303 
runnable

GC task thread#2 (ParallelGC) os_prio=31 tid=0x7fc75f001000 nid=0x2503 
runnable

GC task thread#3 (ParallelGC) os_prio=31 tid=0x7fc75f80 nid=0x2703 
runnable

GC task thread#4 (ParallelGC) os_prio=31 tid=0x7fc75f801000 nid=0x2903 

Review Request 35582: HIVE-11029:hadoop.proxyuser.mapr.groups does not work to restrict the groups that can be impersonated

2015-06-17 Thread Na Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35582/
---

Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-11029
https://issues.apache.org/jira/browse/HIVE-11029


Repository: hive-git


Description
---

Currently Hive session UGI uses createRemoteUser API instead of createProxyUser 
API in the unsecured mode. That way, the impersonated user is not passed to the 
jobtracker/Resourcemanager. This caused the hadoop.proxyuser.mapr.groups does 
not work to restrict the groups that can be impersonated. Any impersonated user 
can launch a mapreduce job.

The fix is replacing the createRemoteUser API by createProxyUser API.


Diffs
-

  
service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java
 56af643 

Diff: https://reviews.apache.org/r/35582/diff/


Testing
---


Thanks,

Na Yang



questions about hive CBO

2015-06-17 Thread wangzhenhua (G)
Hi all,

I'm reading the source code of Hive cbo (CalcaitePlanner), but I find it hard 
to follow.
Listed below are some of the questions:
1. What's the relationship between HepPlanner and HiveVolcanoPlanner?
2. I don't have a clue about these concepts: clusters, traitDef and 
collectGarbage().

Thanks for any help.


best regards,
-zhenhua


Re: questions about hive CBO

2015-06-17 Thread John Pullokkaran
HepPlanner is a greedy planner  VolcanoPlanner is a more exhaustive
planner.
ReloptCluster captures env for planning; it holds on to type factory,
metadata providerŠ

Having said that these are just required plumbings needed to explore plan
alternatives.
CalcitePlanner, Meta data providers, ReloptHiveTable are some of the key
pieces you need to understand.

On 6/17/15, 6:03 PM, wangzhenhua (G) wangzhen...@huawei.com wrote:

Hi all,

I'm reading the source code of Hive cbo (CalcaitePlanner), but I find it
hard to follow.
Listed below are some of the questions:
1. What's the relationship between HepPlanner and HiveVolcanoPlanner?
2. I don't have a clue about these concepts: clusters, traitDef and
collectGarbage().

Thanks for any help.


best regards,
-zhenhua



[jira] [Created] (HIVE-11040) Change Derby dependency version to 10.10.2.0

2015-06-17 Thread Jason Dere (JIRA)
Jason Dere created HIVE-11040:
-

 Summary: Change Derby dependency version to 10.10.2.0
 Key: HIVE-11040
 URL: https://issues.apache.org/jira/browse/HIVE-11040
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere


We don't see this on the Apache pre-commit tests because it uses PTest, but 
running the entire TestCliDriver suite results in failures in some of the 
partition-related qtests (partition_coltype_literals, partition_date, 
partition_date2). I've only really seen this on Linux (I was using CentOS).

HIVE-8879 changed the Derby dependency version from 10.10.1.1 to 10.11.1.1. 
Testing with 10.10.1.1 or 10.20.2.0 seems to allow the partition related tests 
to pass. I'd like to change the dependency version to 10.20.2.0, since that 
version should also contain the fix for HIVE-8879.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34897: CBO: Calcite Operator To Hive Operator (Calcite Return Path) Empty tabAlias in columnInfo which triggers PPD

2015-06-17 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34897/#review88242
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java
 (line 976)
https://reviews.apache.org/r/34897/#comment140666

Is your intention here to change the table alias in the schema of the child 
too?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java
 (line 55)
https://reviews.apache.org/r/34897/#comment140664

It seems that joinOpToAlias is never used. Could it be removed?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java
 (line 110)
https://reviews.apache.org/r/34897/#comment140665

Blank space


- Jesús Camacho Rodríguez


On June 16, 2015, 9:55 p.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34897/
 ---
 
 (Updated June 16, 2015, 9:55 p.m.)
 
 
 Review request for hive and Ashutosh Chauhan.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 in ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 477, 
 when aliases contains empty string  and key is an empty string  too, it 
 assumes that aliases contains key. This will trigger incorrect PPD. To 
 reproduce it, apply the HIVE-10455 and run cbo_subq_notin.q.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverter.java
  9c21238 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/HiveOpConverterPostProc.java
  e7c8342 
 
 Diff: https://reviews.apache.org/r/34897/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong
 




Hive-0.14 - Build # 987 - Still Failing

2015-06-17 Thread Apache Jenkins Server
Changes for Build #980

Changes for Build #981

Changes for Build #982

Changes for Build #983

Changes for Build #984

Changes for Build #985

Changes for Build #986

Changes for Build #987



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #987)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-0.14/987/ to view 
the results.

[jira] [Created] (HIVE-11036) Race condition in DataNucleus makes Metastore to hang

2015-06-17 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-11036:
---

 Summary: Race condition in DataNucleus makes Metastore to hang
 Key: HIVE-11036
 URL: https://issues.apache.org/jira/browse/HIVE-11036
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0, 1.0.0, 0.14.0, 1.1.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Under moderate to high concurrent query workload Metastore gets deadlocked in 
DataNucleus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11037) HiveOnTez: make explain user level = true as default

2015-06-17 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-11037:
--

 Summary: HiveOnTez: make explain user level = true as default
 Key: HIVE-11037
 URL: https://issues.apache.org/jira/browse/HIVE-11037
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong


In Hive-9780, we introduced a new level of explain for hive on tez. We would 
like to make it running by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34757: HIVE-10844: Combine equivalent Works for HoS[Spark Branch]

2015-06-17 Thread chengxiang li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34757/
---

(Updated June 17, 2015, 8:59 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

improve the compare algorithm and update qfile output


Bugs: HIVE-10844
https://issues.apache.org/jira/browse/HIVE-10844


Repository: hive-git


Description
---

Some Hive queries(like TPCDS Q39) may share the same subquery, which translated 
into sperate, but equivalent Works in SparkWork, combining these equivalent 
Works into a single one would help to benifit from following dynamic RDD 
caching optimization.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
  ql/src/java/org/apache/hadoop/hive/ql/plan/JoinCondDesc.java b307b16 
  ql/src/test/results/clientpositive/spark/auto_join30.q.out 7b5c5e7 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 8a43d78 
  ql/src/test/results/clientpositive/spark/groupby10.q.out 9d3cf36 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out abd6459 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out 5e69b31 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 3418b99 
  
ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
 2cb126d 
  ql/src/test/results/clientpositive/spark/groupby8.q.out 307395f 
  ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out ba04a57 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 7df5ba8 
  ql/src/test/results/clientpositive/spark/join22.q.out b1e5b67 
  ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 8a278ef 
  ql/src/test/results/clientpositive/spark/union10.q.out 5e8fe38 
  ql/src/test/results/clientpositive/spark/union11.q.out 20c27c7 
  ql/src/test/results/clientpositive/spark/union20.q.out 6f0dca6 
  ql/src/test/results/clientpositive/spark/union28.q.out 98582df 
  ql/src/test/results/clientpositive/spark/union3.q.out 834b6d4 
  ql/src/test/results/clientpositive/spark/union30.q.out 3409623 
  ql/src/test/results/clientpositive/spark/union4.q.out c121ef0 
  ql/src/test/results/clientpositive/spark/union5.q.out afee988 
  ql/src/test/results/clientpositive/spark/union_remove_1.q.out ba0e293 
  ql/src/test/results/clientpositive/spark/union_remove_15.q.out 26cfbab 
  ql/src/test/results/clientpositive/spark/union_remove_16.q.out 7a7aaf2 
  ql/src/test/results/clientpositive/spark/union_remove_18.q.out a5e15c5 
  ql/src/test/results/clientpositive/spark/union_remove_19.q.out ad44400 
  ql/src/test/results/clientpositive/spark/union_remove_20.q.out 1d67177 
  ql/src/test/results/clientpositive/spark/union_remove_21.q.out 9f5b070 
  ql/src/test/results/clientpositive/spark/union_remove_22.q.out 2e01432 
  ql/src/test/results/clientpositive/spark/union_remove_24.q.out 2659798 
  ql/src/test/results/clientpositive/spark/union_remove_25.q.out 0a94684 
  ql/src/test/results/clientpositive/spark/union_remove_4.q.out 6c3d596 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out cd36189 
  ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out c981ae4 
  ql/src/test/results/clientpositive/spark/union_remove_7.q.out 084fbd6 
  ql/src/test/results/clientpositive/spark/union_top_level.q.out dede1ef 

Diff: https://reviews.apache.org/r/34757/diff/


Testing
---


Thanks,

chengxiang li



Re: Review Request 35532: HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly

2015-06-17 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35532/#review88235
---



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java (line 
1238)
https://reviews.apache.org/r/35532/#comment140647

This doesn't seem right. isGreater() (as oppose to isEqual()) is not 
symmetric w.r.t order of two arguments. e.g., consider v1 = 23 and v2 = NULL, 
this call will return v1  v2. However, if v1 = NULL and v2 = 23, it will still 
return v1  v2. Either NULLs should always be greater or always be smaller, 
otherwise this has potential to generate incorrect result set.


- Ashutosh Chauhan


On June 16, 2015, 8:13 p.m., Aihua Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35532/
 ---
 
 (Updated June 16, 2015, 8:13 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-11025 In windowing spec, when the datatype is decimal, it's comparing 
 the value against NULL value incorrectly
 
 
 Diffs
 -
 
   data/files/emp2.txt 650aff7f2c8003fb7c04dfa377c2b25d04f3ce88 
   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java 
 32471f2dc864c38a2969909efa5b21508e27d7f8 
   ql/src/test/queries/clientpositive/windowing_windowspec3.q 
 608a6cf45e3c1e0b928800dae0470e8acfd77734 
   ql/src/test/results/clientpositive/windowing_windowspec3.q.out 
 42c042f2cf80f0a5a8269ad9eb9864d7e76525cc 
 
 Diff: https://reviews.apache.org/r/35532/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aihua Xu
 




[jira] [Created] (HIVE-11035) PPD: Orc Split elimination fails because filterColumns=[-1]

2015-06-17 Thread Gopal V (JIRA)
Gopal V created HIVE-11035:
--

 Summary: PPD: Orc Split elimination fails because 
filterColumns=[-1]
 Key: HIVE-11035
 URL: https://issues.apache.org/jira/browse/HIVE-11035
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


{code}
create temporary table xx (x int) stored as orc ;
insert into xx values (20),(200);
set hive.fetch.task.conversion=none;
select * from xx where x is null;
{code}

This should generate zero tasks after optional split elimination in the app 
master, instead of generating the 1 task which for sure hits the row-index 
filters and removes all rows anyway.

Right now, this runs 1 task for the stripe containing (min=20, max=200, 
has_null=false), which is broken.

Instead, it returns YES_NO_NULL from the following default case

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L976



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)