date:20130418


 [ 
https://issues.apache.org/jira/browse/HIVE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-4365:
---

Assignee: Navis

 wrong result in left semi join
 --

 Key: HIVE-4365
 URL: https://issues.apache.org/jira/browse/HIVE-4365
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.10.0
Reporter: ransom.hezhiqiang
Assignee: Navis
 Attachments: HIVE-4365.D10341.1.patch


 wrong result in left semi join while hive.optimize.ppd=true
 for example:
 1、create table
create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string) 
   row format DELIMITED FIELDS TERMINATED BY '|';
create table t2(c1 int) ;
 2、load data
 load data local inpath '/home/test/t1.txt' OVERWRITE into table t1;
 load data local inpath '/home/test/t2.txt' OVERWRITE into table t2;
 t1 data:
 1|3|10003|52|781.96|555|201203
 1|3|10003|39|782.96|555|201203
 1|3|10003|87|783.96|555|201203
 2|5|10004|24|789.96|555|201203
 2|5|10004|58|788.96|555|201203
 t2 data:
 555
 3、excute Query
 select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7  from t1 left semi join t2 
 on t1.c6 = t2.c1 and  t1.c1 =  '1' and t1.c7 = '201203' ;   
 can got result.
 select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7  from t1 left semi join t2 
 on t1.c6 = t2.c1 where t1.c1 =  '1' and t1.c7 = '201203' ;   
 can't got result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4365) wrong result in left semi join


 [ 
https://issues.apache.org/jira/browse/HIVE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4365:
--

Attachment: HIVE-4365.D10341.1.patch

navis requested code review of HIVE-4365 [jira] wrong result in left semi 
join.

Reviewers: JIRA

HIVE-4365 wrong result in left semi join

wrong result in left semi join while hive.optimize.ppd=true
for example:
1、create table
   create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string)   
row format DELIMITED FIELDS TERMINATED BY '|';
   create table t2(c1 int) ;
2、load data
load data local inpath '/home/test/t1.txt' OVERWRITE into table t1;
load data local inpath '/home/test/t2.txt' OVERWRITE into table t2;
t1 data:
1|3|10003|52|781.96|555|201203
1|3|10003|39|782.96|555|201203
1|3|10003|87|783.96|555|201203
2|5|10004|24|789.96|555|201203
2|5|10004|58|788.96|555|201203
t2 data:
555
3、excute Query
select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7  from t1 left semi join t2 on 
t1.c6 = t2.c1 and  t1.c1 =  '1' and t1.c7 = '201203' ;
can got result.
select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7  from t1 left semi join t2 on 
t1.c6 = t2.c1 where t1.c1 =  '1' and t1.c7 = '201203' ;
can't got result.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D10341

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/test/queries/clientpositive/semijoin.q
  ql/src/test/results/clientpositive/semijoin.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/24771/

To: JIRA, navis


 wrong result in left semi join
 --

 Key: HIVE-4365
 URL: https://issues.apache.org/jira/browse/HIVE-4365
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.10.0
Reporter: ransom.hezhiqiang
Assignee: Navis
 Attachments: HIVE-4365.D10341.1.patch


 wrong result in left semi join while hive.optimize.ppd=true
 for example:
 1、create table
create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string) 
   row format DELIMITED FIELDS TERMINATED BY '|';
create table t2(c1 int) ;
 2、load data
 load data local inpath '/home/test/t1.txt' OVERWRITE into table t1;
 load data local inpath '/home/test/t2.txt' OVERWRITE into table t2;
 t1 data:
 1|3|10003|52|781.96|555|201203
 1|3|10003|39|782.96|555|201203
 1|3|10003|87|783.96|555|201203
 2|5|10004|24|789.96|555|201203
 2|5|10004|58|788.96|555|201203
 t2 data:
 555
 3、excute Query
 select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7  from t1 left semi join t2 
 on t1.c6 = t2.c1 and  t1.c1 =  '1' and t1.c7 = '201203' ;   
 can got result.
 select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7  from t1 left semi join t2 
 on t1.c6 = t2.c1 where t1.c1 =  '1' and t1.c7 = '201203' ;   
 can't got result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4365) wrong result in left semi join


[ 
https://issues.apache.org/jira/browse/HIVE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634934#comment-13634934
 ] 

Navis commented on HIVE-4365:
-

Yes, it was a PPD problem in RS. Right alias of left semi join takes all 
predicates.

 wrong result in left semi join
 --

 Key: HIVE-4365
 URL: https://issues.apache.org/jira/browse/HIVE-4365
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.10.0
Reporter: ransom.hezhiqiang
Assignee: Navis
 Attachments: HIVE-4365.D10341.1.patch


 wrong result in left semi join while hive.optimize.ppd=true
 for example:
 1、create table
create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string) 
   row format DELIMITED FIELDS TERMINATED BY '|';
create table t2(c1 int) ;
 2、load data
 load data local inpath '/home/test/t1.txt' OVERWRITE into table t1;
 load data local inpath '/home/test/t2.txt' OVERWRITE into table t2;
 t1 data:
 1|3|10003|52|781.96|555|201203
 1|3|10003|39|782.96|555|201203
 1|3|10003|87|783.96|555|201203
 2|5|10004|24|789.96|555|201203
 2|5|10004|58|788.96|555|201203
 t2 data:
 555
 3、excute Query
 select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7  from t1 left semi join t2 
 on t1.c6 = t2.c1 and  t1.c1 =  '1' and t1.c7 = '201203' ;   
 can got result.
 select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7  from t1 left semi join t2 
 on t1.c6 = t2.c1 where t1.c1 =  '1' and t1.c7 = '201203' ;   
 can't got result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4376) Document ORC file format in Hive wiki

2013-04-18 Thread Lefty Leverenz (JIRA)

Lefty Leverenz created HIVE-4376:


 Summary: Document ORC file format in Hive wiki
 Key: HIVE-4376
 URL: https://issues.apache.org/jira/browse/HIVE-4376
 Project: Hive
  Issue Type: Bug
  Components: Documentation, Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Lefty Leverenz
Assignee: Lefty Leverenz


Add a wiki documenting the Optimized Row Columnar file format for Hive release 
0.11 ([HIVE-3874|https://issues.apache.org/jira/browse/HIVE-3874]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: enhance TRUNCATE syntex to drop data of external table

2013-04-18 Thread Teddy Choi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10600/#review19374
---



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
https://reviews.apache.org/r/10600/#comment40085

ST class is needed to compile.



http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
https://reviews.apache.org/r/10600/#comment40084

A build failure on this line.


- Teddy Choi


On April 18, 2013, 6:25 a.m., fangkun cao wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/10600/
 ---
 
 (Updated April 18, 2013, 6:25 a.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 https://issues.apache.org/jira/browse/HIVE-4367
 
 
 This addresses bug HIVE-4367.
 https://issues.apache.org/jira/browse/HIVE-4367
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  1468713 
   http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml.template 
 1468713 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
  1468713 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
  1468713 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
  1468713 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
  1468713 
 
 Diff: https://reviews.apache.org/r/10600/diff/
 
 
 Testing
 ---
 
 create external table external1 (a int, b int) partitioned by (ds string);
 load data local inpath '../data/files/kv1.txt' into table external1 partition 
 (ds='2008-04-08');
 load data local inpath '../data/files/kv1.txt' into table external1 partition 
 (ds='2008-04-09');
 
 -- trucate EXTERNAL table
 TRUNCATE TABLE external1 PARTITION (ds='2008-04-08') FORCE;
 select * from external1  where ds='2008-04-08'; 
 select * from external1  where ds='2008-04-09';
 
 TRUNCATE TABLE external1 FORCE;
 select * from external1;
 
 
 Thanks,
 
 fangkun cao

Re: Review Request: enhance TRUNCATE syntex to drop data of external table

2013-04-18 Thread fangkun cao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10600/
---

(Updated April 18, 2013, 8:19 a.m.)


Review request for hive.


Changes
---

add truncate_table_force.q 
and import org.stringtemplate.v4.ST;


Description
---

https://issues.apache.org/jira/browse/HIVE-4367


This addresses bug HIVE-4367.
https://issues.apache.org/jira/browse/HIVE-4367


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 1469218 
  http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml.template 
1469218 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 1469218 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
 1469218 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
 1469218 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
 1469218 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/truncate_table_force.q
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/truncate_table_force.q.out
 PRE-CREATION 

Diff: https://reviews.apache.org/r/10600/diff/


Testing
---

create external table external1 (a int, b int) partitioned by (ds string);
load data local inpath '../data/files/kv1.txt' into table external1 partition 
(ds='2008-04-08');
load data local inpath '../data/files/kv1.txt' into table external1 partition 
(ds='2008-04-09');

-- trucate EXTERNAL table
TRUNCATE TABLE external1 PARTITION (ds='2008-04-08') FORCE;
select * from external1  where ds='2008-04-08'; 
select * from external1  where ds='2008-04-09';

TRUNCATE TABLE external1 FORCE;
select * from external1;


Thanks,

fangkun cao

[jira] [Commented] (HIVE-4371) some issue with merging join trees


[ 
https://issues.apache.org/jira/browse/HIVE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634999#comment-13634999
 ] 

Namit Jain commented on HIVE-4371:
--

I am not sure about the last test case.
Why is left Alias (es) and right Alias (es) not correct for that ?

 some issue with merging join trees
 --

 Key: HIVE-4371
 URL: https://issues.apache.org/jira/browse/HIVE-4371
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Namit Jain
Assignee: Navis
 Attachments: HIVE-4371.D10323.1.patch


 [~navis], I would really appreciate if you can take a look.
 I am attaching a testcase, for which in the optimizer the join context left
 aliases and right aliases do not look correct.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: New code for VectorizedRowBatch to form basis of vectorized query execution

2013-04-18 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10592/#review19379
---



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
https://reviews.apache.org/r/10592/#comment40093

These comments violate the coding conventions. ColumnVector and 
VectorizedRowBatch have the same problem.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
https://reviews.apache.org/r/10592/#comment40092

Move the constant 1.2 to a static final float and refer to it by name.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
https://reviews.apache.org/r/10592/#comment40091

Formatting



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
https://reviews.apache.org/r/10592/#comment40090

Please throw a runtime exception here instead of relying on asserts (which 
can be disabled).



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatch.java
https://reviews.apache.org/r/10592/#comment40089

Please correct the formatting issues in this file.


- Carl Steinbach


On April 18, 2013, 1:27 a.m., Eric Hanson wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/10592/
 ---
 
 (Updated April 18, 2013, 1:27 a.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 New code for VectorizedRowBatch to form basis of vectorized query execution
 
 
 This addresses bug HIVE-4284.
 https://issues.apache.org/jira/browse/HIVE-4284
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DoubleColumnVector.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/LongColumnVector.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
 PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatch.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/10592/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Eric Hanson

[jira] [Updated] (HIVE-3891) physical optimizer changes for auto sort-merge join


 [ 
https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3891:
-

Attachment: hive.3891.14.patch

 physical optimizer changes for auto sort-merge join
 ---

 Key: HIVE-3891
 URL: https://issues.apache.org/jira/browse/HIVE-3891
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: auto_sortmerge_join_1.q, auto_sortmerge_join_1.q.out, 
 hive.3891.10.patch, hive.3891.11.patch, hive.3891.12.patch, 
 hive.3891.13.patch, hive.3891.14.patch, hive.3891.1.patch, hive.3891.2.patch, 
 hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, 
 hive.3891.7.patch, HIVE-3891_8.patch, hive.3891.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3891) physical optimizer changes for auto sort-merge join


[ 
https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635005#comment-13635005
 ] 

Namit Jain commented on HIVE-3891:
--

[~ashutoshc], all the tests passed. Since this was accepted sometime back, can 
you take a look again ?

 physical optimizer changes for auto sort-merge join
 ---

 Key: HIVE-3891
 URL: https://issues.apache.org/jira/browse/HIVE-3891
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: auto_sortmerge_join_1.q, auto_sortmerge_join_1.q.out, 
 hive.3891.10.patch, hive.3891.11.patch, hive.3891.12.patch, 
 hive.3891.13.patch, hive.3891.14.patch, hive.3891.1.patch, hive.3891.2.patch, 
 hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, 
 hive.3891.7.patch, HIVE-3891_8.patch, hive.3891.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4095) Add exchange partition in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635108#comment-13635108
 ] 

Namit Jain commented on HIVE-4095:
--

more comments

 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

hi

2013-04-18 Thread Namit Jain

Hi,

Since we are developing at a very fast pace, it would be really useful to think 
about maintainability and testing of the large codebase.
Historically, we have not focussed on a few things, and they might soon bite 
us. I wanted to propose the following for all checkins:


  1.  Javadoc for all public/private functions, except for setters/getters. For 
any complex function, clear examples (input/output) would really help.
  2.  Convention for variable/function names – do we have any ?
  3.  If possible, the test name (.q file) where the function is being invoked, 
or the query which would potentially test that scenario, if it is a query 
processor change.
  4.  Specially, for query optimizations, it might be a good idea to have a 
simple working query at the top, and the expected changes. For e.g.. The 
operator tree for that query at each step, or a detailed explanation at the top.
  5.  Comments in each test (.q file)– that should include the jira number,  
what is it trying to test. Assumptions about each query.
  6.  Reduce the output for each test – whenever query is outputting more than 
10 results, it should have a reason. Otherwise, each query result should be 
bounded by 10 rows.

In general, focussing on a lot of comments in the code will go a long way for 
everyone to follow along.

Thanks,
-namit

[jira] [Commented] (HIVE-4304) Remove unused builtins and pdk submodules

[
https://issues.apache.org/jira/browse/HIVE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635233#comment-13635233
]

Ashutosh Chauhan commented on HIVE-4304:

[~traviscrawford] Rebasing your branch after commit of HIVE-4278 and with few
minor edits, I was able to build successfully.

Remove unused builtins and pdk submodules
-

Key: HIVE-4304
URL: https://issues.apache.org/jira/browse/HIVE-4304
Project: Hive
Issue Type: Improvement
Reporter: Travis Crawford
Assignee: Travis Crawford
Attachments: HIVE-4304.1.patch

Moving from email. The
[builtins|http://svn.apache.org/repos/asf/hive/trunk/builtins/] and
[pdk|http://svn.apache.org/repos/asf/hive/trunk/pdk/] submodules are not
believed to be in use and should be removed. The main benefits are
simplification and maintainability of the Hive code base.
Forwarded conversation
Subject: builtins submodule - is it still needed?

From: Travis Crawford traviscrawf...@gmail.com
Date: Thu, Apr 4, 2013 at 2:01 PM
To: u...@hive.apache.org, dev@hive.apache.org
Hey hive gurus -
Is the builtins hive submodule in use? The submodule was added in
HIVE-2523 as a location for builtin-UDFs, but it appears to not have
taken off. Any objections to removing it?
DETAILS
For HIVE-4278 I'm making some build changes for the HCatalog
integration. The builtins submodule causes issues because it delays
building until the packaging phase - so HCatalog can't depend on
builtins, which it does transitively.
While investigating a path forward I discovered the builtins
submodule contains very little code, and likely could either go away
entirely or merge into ql, simplifying things both for users and
developers.
Thoughts? Can anyone with context help me understand builtins, both
in general and around its non-standard build? For your trouble I'll
either make the submodule go away/merge into another submodule, or
update the docs with what we learn.
Thanks!
Travis
--
From: Ashutosh Chauhan ashutosh.chau...@gmail.com
Date: Fri, Apr 5, 2013 at 3:10 PM
To: dev@hive.apache.org
Cc: u...@hive.apache.org u...@hive.apache.org
I haven't used it myself anytime till now. Neither have met anyone who used
it or plan to use it.
Ashutosh
On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford
traviscrawf...@gmail.comwrote:
--
From: Gunther Hagleitner ghagleit...@hortonworks.com
Date: Fri, Apr 5, 2013 at 3:11 PM
To: dev@hive.apache.org
Cc: u...@hive.apache.org
+1
I would actually go a step further and propose to remove both PDK and
builtins. I've went through the code for both and here is what I found:
Builtins:
- BuiltInUtils.java: Empty file
- UDAFUnionMap: Merges maps. Doesn't seem to be useful by itself, but was
intended as a building block for PDK
PDK:
- some helper build.xml/test setup + teardown scripts
- Classes/annotations to help run unit tests
- rot13 as an example
From what I can tell it's a fair assessment that it hasn't taken off, last
commits to it seem to have happened more than 1.5 years ago.
Thanks,
Gunther.
On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford
traviscrawf...@gmail.comwrote:
--
From: Owen O'Malley omal...@apache.org
Date: Fri, Apr 5, 2013 at 4:45 PM
To: u...@hive.apache.org
+1 to removing them.
We have a Rot13 example in
ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13{In,Out}putFormat.java
anyways. *smile*
-- Owen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4225) HiveServer2 does not support SASL QOP

2013-04-18 Thread Joey Echeverria (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635236#comment-13635236
 ] 

Joey Echeverria commented on HIVE-4225:
---

Does it make sense to push this as is and then have a follow-up issue tied to 
[HIVE-4232]?

 HiveServer2 does not support SASL QOP
 -

 Key: HIVE-4225
 URL: https://issues.apache.org/jira/browse/HIVE-4225
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Shims
Affects Versions: 0.11.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.11.0

 Attachments: HIVE-4225.patch


 HiveServer2 implements Kerberos authentication through SASL framework, but 
 does not support setting QOP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: hi

2013-04-18 Thread Gang Tim Liu

Super like it.

On 4/18/13 5:31 AM, Namit Jain nj...@fb.com wrote:

Hi,

Since we are developing at a very fast pace, it would be really useful to
think about maintainability and testing of the large codebase.
Historically, we have not focussed on a few things, and they might soon
bite us. I wanted to propose the following for all checkins:


  1.  Javadoc for all public/private functions, except for
setters/getters. For any complex function, clear examples (input/output)
would really help.
  2.  Convention for variable/function names  do we have any ?
  3.  If possible, the test name (.q file) where the function is being
invoked, or the query which would potentially test that scenario, if it
is a query processor change.
  4.  Specially, for query optimizations, it might be a good idea to have
a simple working query at the top, and the expected changes. For e.g..
The operator tree for that query at each step, or a detailed explanation
at the top.
  5.  Comments in each test (.q file) that should include the jira
number,  what is it trying to test. Assumptions about each query.
  6.  Reduce the output for each test  whenever query is outputting more
than 10 results, it should have a reason. Otherwise, each query result
should be bounded by 10 rows.

In general, focussing on a lot of comments in the code will go a long way
for everyone to follow along.

Thanks,
-namit

[jira] [Created] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)

2013-04-18 Thread Gang Tim Liu (JIRA)

Gang Tim Liu created HIVE-4377:
--

 Summary: Add more comment to https://reviews.facebook.net/D1209 
(HIVE-2340)
 Key: HIVE-4377
 URL: https://issues.apache.org/jira/browse/HIVE-4377
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Navis


thanks a lot for addressing optimization in HIVE-2340. Awesome!

Since we are developing at a very fast pace, it would be really useful to
think about maintainability and testing of the large codebase. Highlights which 
are applicable for D1209:

  1.  Javadoc for all public/private functions, except for
setters/getters. For any complex function, clear examples (input/output)
would really help.
  2.  Specially, for query optimizations, it might be a good idea to have
a simple working query at the top, and the expected changes. For e.g..
The operator tree for that query at each step, or a detailed explanation
at the top.
  3.  If possible, the test name (.q file) where the function is being
invoked, or the query which would potentially test that scenario, if it
is a query processor change.
  4.  Comments in each test (.q file) that should include the jira
number,  what is it trying to test. Assumptions about each query.
  5.  Reduce the output for each test  whenever query is outputting more
than 10 results, it should have a reason. Otherwise, each query result
should be bounded by 10 rows.

thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: hi

2013-04-18 Thread Jarek Jarcec Cecho

Hi Namit,
I like your proposal very much and I would take it a bit further:

   1.  ... For any complex function, clear examples (input/output) would 
 really help.

I'm concerned that examples in the code (comments) might very quickly become 
obsolete as it can very easily happen that someone will change the code without 
changing the example. What about using for this purpose normal unit tests? 
Developers will still be able to see the expected input/output, but in addition 
we will have automatic way how to detect (possibly incompatible) changes. 
Please note that I'm not suggesting to abandon the *.q file tests, just to also 
include unit tests for complex methods.

Jarcec

On Thu, Apr 18, 2013 at 12:31:10PM +, Namit Jain wrote:
 Hi,
 
 Since we are developing at a very fast pace, it would be really useful to 
 think about maintainability and testing of the large codebase.
 Historically, we have not focussed on a few things, and they might soon bite 
 us. I wanted to propose the following for all checkins:
 
 
   1.  Javadoc for all public/private functions, except for setters/getters. 
 For any complex function, clear examples (input/output) would really help.
   2.  Convention for variable/function names – do we have any ?
   3.  If possible, the test name (.q file) where the function is being 
 invoked, or the query which would potentially test that scenario, if it is a 
 query processor change.
   4.  Specially, for query optimizations, it might be a good idea to have a 
 simple working query at the top, and the expected changes. For e.g.. The 
 operator tree for that query at each step, or a detailed explanation at the 
 top.
   5.  Comments in each test (.q file)– that should include the jira number,  
 what is it trying to test. Assumptions about each query.
   6.  Reduce the output for each test – whenever query is outputting more 
 than 10 results, it should have a reason. Otherwise, each query result should 
 be bounded by 10 rows.
 
 In general, focussing on a lot of comments in the code will go a long way for 
 everyone to follow along.
 
 Thanks,
 -namit


signature.asc
Description: Digital signature

[jira] [Updated] (HIVE-4095) Add exchange partition in Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4095:
--

Attachment: HIVE-4095.D10347.1.patch

sindheeraj requested code review of HIVE-4095 [jira] Add exchange partition in 
Hive.

Reviewers: JIRA

JIRA changes

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D10347

AFFECTED FILES
  .gitignore
  metastore/if/hive_metastore.thrift
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
  metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableExchangePartition.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java
  
ql/src/test/queries/clientnegative/exchange_partition_neg_incomplete_partition.q
  ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists.q
  ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists2.q
  ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists3.q
  ql/src/test/queries/clientnegative/exchange_partition_neg_partition_missing.q
  ql/src/test/queries/clientnegative/exchange_partition_neg_table_missing.q
  ql/src/test/queries/clientnegative/exchange_partition_neg_table_missing2.q
  ql/src/test/queries/clientnegative/exchange_partition_neg_test.q
  ql/src/test/queries/clientpositive/exchange_partition.q
  ql/src/test/queries/clientpositive/exchange_partition2.q
  ql/src/test/queries/clientpositive/exchange_partition3.q
  
ql/src/test/results/clientnegative/exchange_partition_neg_incomplete_partition.q.out
  
ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists.q.out
  
ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists2.q.out
  
ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists3.q.out
  
ql/src/test/results/clientnegative/exchange_partition_neg_partition_missing.q.out
  ql/src/test/results/clientnegative/exchange_partition_neg_table_missing.q.out
  ql/src/test/results/clientnegative/exchange_partition_neg_table_missing2.q.out
  ql/src/test/results/clientnegative/exchange_partition_neg_test.q.out
  ql/src/test/results/clientpositive/exchange_partition.q.out
  ql/src/test/results/clientpositive/exchange_partition2.q.out
  ql/src/test/results/clientpositive/exchange_partition3.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/24783/

To: JIRA, sindheeraj


 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
 HIVE-4095.part12.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4095) Add exchange partition in Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dheeraj Kumar Singh updated HIVE-4095:
--

Attachment: (was: HIVE-4095.part11.patch.txt)

 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.D10347.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4095) Add exchange partition in Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dheeraj Kumar Singh updated HIVE-4095:
--

Attachment: (was: HIVE-4095.part12.patch.txt)

 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.D10347.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4095) Add exchange partition in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635282#comment-13635282
 ] 

Phabricator commented on HIVE-4095:
---

sindheeraj has abandoned the revision HIVE-4095 [jira] Add exchange partition 
in Hive.

REVISION DETAIL
  https://reviews.facebook.net/D10347

To: JIRA, sindheeraj


 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.D10347.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: hi

2013-04-18 Thread Namit Jain

Agreed.


On 4/18/13 9:19 PM, Jarek Jarcec Cecho jar...@apache.org wrote:

Hi Namit,
I like your proposal very much and I would take it a bit further:

   1.  ... For any complex function, clear examples (input/output) would
really help.

I'm concerned that examples in the code (comments) might very quickly
become obsolete as it can very easily happen that someone will change the
code without changing the example. What about using for this purpose
normal unit tests? Developers will still be able to see the expected
input/output, but in addition we will have automatic way how to detect
(possibly incompatible) changes. Please note that I'm not suggesting to
abandon the *.q file tests, just to also include unit tests for complex
methods.

Jarcec

On Thu, Apr 18, 2013 at 12:31:10PM +, Namit Jain wrote:
 Hi,
 
 Since we are developing at a very fast pace, it would be really useful
to think about maintainability and testing of the large codebase.
 Historically, we have not focussed on a few things, and they might soon
bite us. I wanted to propose the following for all checkins:
 
 
   1.  Javadoc for all public/private functions, except for
setters/getters. For any complex function, clear examples (input/output)
would really help.
   2.  Convention for variable/function names  do we have any ?
   3.  If possible, the test name (.q file) where the function is being
invoked, or the query which would potentially test that scenario, if it
is a query processor change.
   4.  Specially, for query optimizations, it might be a good idea to
have a simple working query at the top, and the expected changes. For
e.g.. The operator tree for that query at each step, or a detailed
explanation at the top.
   5.  Comments in each test (.q file) that should include the jira
number,  what is it trying to test. Assumptions about each query.
   6.  Reduce the output for each test  whenever query is outputting
more than 10 results, it should have a reason. Otherwise, each query
result should be bounded by 10 rows.
 
 In general, focussing on a lot of comments in the code will go a long
way for everyone to follow along.
 
 Thanks,
 -namit

Re: hi

2013-04-18 Thread Brock Noland

Hi,

I like the proposal as well!

On Thu, Apr 18, 2013 at 10:49 AM, Jarek Jarcec Cecho jar...@apache.orgwrote:

 Hi Namit,
 I like your proposal very much and I would take it a bit further:

1.  ... For any complex function, clear examples (input/output) would
 really help.

 I'm concerned that examples in the code (comments) might very quickly
 become obsolete as it can very easily happen that someone will change the
 code without changing the example. What about using for this purpose normal
 unit tests? Developers will still be able to see the expected input/output,
 but in addition we will have automatic way how to detect (possibly
 incompatible) changes. Please note that I'm not suggesting to abandon the
 *.q file tests, just to also include unit tests for complex methods.


I'd be interested in including more unit tests as well. I like the existing
q file test framework but when working on code I find unit tests which can
complete in less than a second or allows for faster iterations than waiting
30 or so seconds for a q-file test to complete.

Brock

[jira] [Updated] (HIVE-4095) Add exchange partition in Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dheeraj Kumar Singh updated HIVE-4095:
--

Attachment: HIVE-4095.part11.patch.txt
HIVE-4095.part12.patch.txt

 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
 HIVE-4095.part12.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4095) Add exchange partition in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635285#comment-13635285
 ] 

Dheeraj Kumar Singh commented on HIVE-4095:
---

The revision is still https://reviews.facebook.net/D10035, I have abandoned the 
extra revision phabricator had created.

 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
 HIVE-4095.part12.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: hi

2013-04-18 Thread Namit Jain

Having said that, it might be difficult to write unit tests for operator
trees.
Might take more time initially - so, making it a constraint might slow us
down.


On 4/18/13 9:41 PM, Brock Noland br...@cloudera.com wrote:

Hi,

I like the proposal as well!

On Thu, Apr 18, 2013 at 10:49 AM, Jarek Jarcec Cecho
jar...@apache.orgwrote:

 Hi Namit,
 I like your proposal very much and I would take it a bit further:

1.  ... For any complex function, clear examples (input/output)
would
 really help.

 I'm concerned that examples in the code (comments) might very quickly
 become obsolete as it can very easily happen that someone will change
the
 code without changing the example. What about using for this purpose
normal
 unit tests? Developers will still be able to see the expected
input/output,
 but in addition we will have automatic way how to detect (possibly
 incompatible) changes. Please note that I'm not suggesting to abandon
the
 *.q file tests, just to also include unit tests for complex methods.


I'd be interested in including more unit tests as well. I like the
existing
q file test framework but when working on code I find unit tests which can
complete in less than a second or allows for faster iterations than
waiting
30 or so seconds for a q-file test to complete.

Brock

Re: hi

2013-04-18 Thread Brock Noland

Agreed, given that most of our tests our existing tests are in .q files,
I'd prefer to see more of a unit tests highly encouraged policy as
opposed to must have unit tests.


On Thu, Apr 18, 2013 at 11:17 AM, Namit Jain nj...@fb.com wrote:

 Having said that, it might be difficult to write unit tests for operator
 trees.
 Might take more time initially - so, making it a constraint might slow us
 down.


 On 4/18/13 9:41 PM, Brock Noland br...@cloudera.com wrote:

 Hi,
 
 I like the proposal as well!
 
 On Thu, Apr 18, 2013 at 10:49 AM, Jarek Jarcec Cecho
 jar...@apache.orgwrote:
 
  Hi Namit,
  I like your proposal very much and I would take it a bit further:
 
 1.  ... For any complex function, clear examples (input/output)
 would
  really help.
 
  I'm concerned that examples in the code (comments) might very quickly
  become obsolete as it can very easily happen that someone will change
 the
  code without changing the example. What about using for this purpose
 normal
  unit tests? Developers will still be able to see the expected
 input/output,
  but in addition we will have automatic way how to detect (possibly
  incompatible) changes. Please note that I'm not suggesting to abandon
 the
  *.q file tests, just to also include unit tests for complex methods.
 
 
 I'd be interested in including more unit tests as well. I like the
 existing
 q file test framework but when working on code I find unit tests which can
 complete in less than a second or allows for faster iterations than
 waiting
 30 or so seconds for a q-file test to complete.
 
 Brock




-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby


[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635314#comment-13635314
 ] 

Phabricator commented on HIVE-2340:
---

njain has commented on the revision HIVE-2340 [jira] optimize orderby followed 
by a groupby.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:181
 nit: spelling Abstract
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:103
 The order in which the rules are specified matter, since in case of exact 
match for costs,
  the last rule is invoked.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:122
 What are the semantics of trustScript ?
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:359
 can you add more comments ?

REVISION DETAIL
  https://reviews.facebook.net/D1209

BRANCH
  DPAL-592

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, navis
Cc: hagleitn, njain


 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Fix For: 0.11.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
 HIVE-2340.13.patch, HIVE-2340.14.patch, 
 HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, 
 HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
 HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, 
 HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
 HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2019) Implement NOW() UDF

2013-04-18 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635327#comment-13635327
 ] 

Eric Hanson commented on HIVE-2019:
---

Agreed, especially with the phrase right before executing the query. The 
timestamp should be gotten once at query execution startup time, not compile 
time. Although these two steps are pretty much the same in Hive now, someday 
there could be a plan cache, so a cached NOW() result would get stale. Or, if a 
compilation takes a long time for some reason, NOW() could get stale. This is 
how it is done in one commercial DBMS that I know. If there are multiple 
different flavors of date and time functions, they should all be based off the 
same internal hi-resolution timestamp. That way they would all be consistent 
within one query execution if multiple functions are used, say DATE(), NOW() 
etc. in the same query.

 Implement NOW() UDF
 ---

 Key: HIVE-2019
 URL: https://issues.apache.org/jira/browse/HIVE-2019
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Carl Steinbach
Assignee: Priyadarshini
 Attachments: HIVE-2019.patch


 Reference: 
 http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-18 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-4333:


Attachment: HIVE-4333.1.patch.txt

 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Matthew Weaver
 Attachments: HIVE-4333.1.patch.txt


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-18 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635362#comment-13635362
 ] 

Harish Butani commented on HIVE-4333:
-

Attached a patch. The changes fall into these categories:

- some queries had 'partition by p_mfgr order by p_mfgr' or just 'partition by 
p_mfgr'. In these cases rows within a partition are not coming in the same 
order as in hadoop 1. Changed to 'partition by p_mfgr order by p_name'
- Manufacturer 1 has 2 rows with exactly the same data; so if we use a 'row 
based window' there are diffs between 1  2. Changed to using a 'range based 
window'
- There are diffs because of precision. Some of the avg and sum functions are 
now wrapped in 'round'
- Finally tests with the empty over() on fns that relied on order had to 
changed. 
For e.g. leadlag.q Query 8. I tried the following change:
{noformat}
select p_name, p_retailprice,
lead(p_retailprice) over() as l1 ,
lag(p_retailprice)  over() as l2
from (select p_name, p_retailprice from part where p_mfgr = 'Manufacturer#1' 
order by p_name, p_retailprice ) p;
{noformat}

The output in hadoop 1 is:
{noformat}
almond antique burnished rose metallic  1173.15 1173.15 NULL
almond antique burnished rose metallic  1173.15 1753.76 1173.15
almond antique chartreuse lavender yellow   1753.76 1602.59 1173.15
almond antique salmon chartreuse burlywood  1602.59 1414.42 1753.76
almond aquamarine burnished black steel 1414.42 1632.66 1602.59
almond aquamarine pink moccasin thistle 1632.66 NULL1414.42
{noformat}

The input to lead and lag query is ordered on p_name and p_retailprice and is 
very small, just 6 rows(so only 1 mapper is involved) In 1.0 the rows are 
coming to the reducer in the same order as the input


In hadoop 2.0 the result is:
{noformat}
almond aquamarine pink moccasin thistle 1632.66 1414.42 NULL
almond aquamarine burnished black steel 1414.42 1602.59 1632.66
almond antique salmon chartreuse burlywood  1602.59 1753.76 1414.42
almond antique chartreuse lavender yellow   1753.76 1173.15 1602.59
almond antique burnished rose metallic  1173.15 1173.15 1753.76
almond antique burnished rose metallic  1173.15 NULL1173.15
{noformat}

Looks like the shuffle in 2.0 reorders the rows even in this case. 

 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Matthew Weaver
 Attachments: HIVE-4333.1.patch.txt


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4095) Add exchange partition in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635424#comment-13635424
 ] 

Namit Jain commented on HIVE-4095:
--

+1

 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
 HIVE-4095.part12.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4019) Ability to create and drop temporary partition function

2013-04-18 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635470#comment-13635470
 ] 

Brock Noland commented on HIVE-4019:


https://reviews.facebook.net/D10353

 Ability to create and drop temporary partition function
 ---

 Key: HIVE-4019
 URL: https://issues.apache.org/jira/browse/HIVE-4019
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-4019-1.patch, HIVE-4019.2.patch, HIVE-4019-3.patch, 
 HIVE-4019-4.patch, hive-4019.q


 Just like udf/udaf/udtf functions, user should be able to add and drop custom 
 partitioning functions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Install

2013-04-18 Thread Suman Prabhala

Hi,

how to install Hive  on my personal desktop. Am very new to Hive and my
background is Legacy Mainframe system.

can you please suggest me in detail manner.

Thanks,
Suman

[jira] [Created] (HIVE-4378) Counters hit performance even when not used

Gunther Hagleitner created HIVE-4378:


 Summary: Counters hit performance even when not used
 Key: HIVE-4378
 URL: https://issues.apache.org/jira/browse/HIVE-4378
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.11.0


preprocess/postprocess counters perform a number of computations even when 
there are no counters to update. Performance runs are captured in: 
https://issues.apache.org/jira/browse/HIVE-4318

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4378) Counters hit performance even when not used


 [ 
https://issues.apache.org/jira/browse/HIVE-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4378:
-

Attachment: HIVE-4378.1.patch

 Counters hit performance even when not used
 ---

 Key: HIVE-4378
 URL: https://issues.apache.org/jira/browse/HIVE-4378
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.11.0

 Attachments: HIVE-4378.1.patch


 preprocess/postprocess counters perform a number of computations even when 
 there are no counters to update. Performance runs are captured in: 
 https://issues.apache.org/jira/browse/HIVE-4318

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4378) Counters hit performance even when not used


[ 
https://issues.apache.org/jira/browse/HIVE-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635508#comment-13635508
 ] 

Gunther Hagleitner commented on HIVE-4378:
--

https://reviews.facebook.net/D10359

 Counters hit performance even when not used
 ---

 Key: HIVE-4378
 URL: https://issues.apache.org/jira/browse/HIVE-4378
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.11.0

 Attachments: HIVE-4378.1.patch


 preprocess/postprocess counters perform a number of computations even when 
 there are no counters to update. Performance runs are captured in: 
 https://issues.apache.org/jira/browse/HIVE-4318

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4282) Implement vectorized column-scalar expressions


 [ 
https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4282:
---

Summary: Implement vectorized column-scalar expressions  (was: Implement 
vectorized arithmetic expressions.)

 Implement vectorized column-scalar expressions
 --

 Key: HIVE-4282
 URL: https://issues.apache.org/jira/browse/HIVE-4282
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey

 Implement arithmetic expressions that operate on vectors of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4379) Implement Vectorized Column-Column expressions

Jitendra Nath Pandey created HIVE-4379:
--

 Summary: Implement Vectorized Column-Column expressions
 Key: HIVE-4379
 URL: https://issues.apache.org/jira/browse/HIVE-4379
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


This covers the expressions involving two columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4378) Counters hit performance even when not used


 [ 
https://issues.apache.org/jira/browse/HIVE-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4378:
-

Status: Patch Available  (was: Open)

 Counters hit performance even when not used
 ---

 Key: HIVE-4378
 URL: https://issues.apache.org/jira/browse/HIVE-4378
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.11.0

 Attachments: HIVE-4378.1.patch


 preprocess/postprocess counters perform a number of computations even when 
 there are no counters to update. Performance runs are captured in: 
 https://issues.apache.org/jira/browse/HIVE-4318

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4380) Implement Vectorized Scalar-Column expressions

Jitendra Nath Pandey created HIVE-4380:
--

 Summary: Implement Vectorized Scalar-Column expressions
 Key: HIVE-4380
 URL: https://issues.apache.org/jira/browse/HIVE-4380
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Eric Hanson


The expressions with scalar as the first operand.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4318) OperatorHooks hit performance even when not used


 [ 
https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4318:
-

Attachment: HIVE-4318.3.patch

 OperatorHooks hit performance even when not used
 

 Key: HIVE-4318
 URL: https://issues.apache.org/jira/browse/HIVE-4318
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC (64 bit)
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.3.patch, 
 HIVE-4318.patch.pam.txt


 Operator Hooks inserted into Operator.java cause a performance hit even when 
 it is not being used.
 For a count(1) query tested with  without the operator hook calls.
 {code:title=with}
 2013-04-09 07:33:58,920 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 84.07 sec
 Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec
 OK
 28800991
 Time taken: 40.407 seconds, Fetched: 1 row(s)
 {code}
 {code:title=without}
 2013-04-09 07:33:02,355 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 68.48 sec
 ...
 Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec
 OK
 28800991
 Time taken: 35.907 seconds, Fetched: 1 row(s)
 {code}
 The effect is multiplied by the number of operators in the pipeline that has 
 to forward the row - the more operators there are the, the slower the query.
 The modification made to test this was 
 {code:title=Operator.java}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
 @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws 
 HiveException {
return;
  }
  OperatorHookContext opHookContext = new OperatorHookContext(this, row, 
 tag);
 -preProcessCounter();
 -enterOperatorHooks(opHookContext);
 +//preProcessCounter();
 +//enterOperatorHooks(opHookContext);
  processOp(row, tag);
 -exitOperatorHooks(opHookContext);
 -postProcessCounter();
 +//exitOperatorHooks(opHookContext);
 +//postProcessCounter();
}
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4381) Implement vectorized aggregation expressions

Jitendra Nath Pandey created HIVE-4381:
--

 Summary: Implement vectorized aggregation expressions
 Key: HIVE-4381
 URL: https://issues.apache.org/jira/browse/HIVE-4381
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Remus Rusanu


Vectorized implementation for sum, min, max, average and count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4282) Implement vectorized column-scalar expressions


 [ 
https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4282:
---

Description: Implement arithmetic expressions involving a column and a 
scalar with column as first argument.  (was: Implement arithmetic expressions 
that operate on vectors of columns.)

 Implement vectorized column-scalar expressions
 --

 Key: HIVE-4282
 URL: https://issues.apache.org/jira/browse/HIVE-4282
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey

 Implement arithmetic expressions involving a column and a scalar with column 
 as first argument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used


[ 
https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635527#comment-13635527
 ] 

Gunther Hagleitner commented on HIVE-4318:
--

Thanks. I've rebased the patch and split it into two. HIVE-4378 has the changes 
for the counters, this one is about operator hooks/profiler. This way I am 
hoping it's easier to start the work on re-introducing the profiler, because 
only relevant changes are captured in this patch.

 OperatorHooks hit performance even when not used
 

 Key: HIVE-4318
 URL: https://issues.apache.org/jira/browse/HIVE-4318
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC (64 bit)
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.3.patch, 
 HIVE-4318.patch.pam.txt


 Operator Hooks inserted into Operator.java cause a performance hit even when 
 it is not being used.
 For a count(1) query tested with  without the operator hook calls.
 {code:title=with}
 2013-04-09 07:33:58,920 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 84.07 sec
 Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec
 OK
 28800991
 Time taken: 40.407 seconds, Fetched: 1 row(s)
 {code}
 {code:title=without}
 2013-04-09 07:33:02,355 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 68.48 sec
 ...
 Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec
 OK
 28800991
 Time taken: 35.907 seconds, Fetched: 1 row(s)
 {code}
 The effect is multiplied by the number of operators in the pipeline that has 
 to forward the row - the more operators there are the, the slower the query.
 The modification made to test this was 
 {code:title=Operator.java}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
 @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws 
 HiveException {
return;
  }
  OperatorHookContext opHookContext = new OperatorHookContext(this, row, 
 tag);
 -preProcessCounter();
 -enterOperatorHooks(opHookContext);
 +//preProcessCounter();
 +//enterOperatorHooks(opHookContext);
  processOp(row, tag);
 -exitOperatorHooks(opHookContext);
 -postProcessCounter();
 +//exitOperatorHooks(opHookContext);
 +//postProcessCounter();
}
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4318) OperatorHooks hit performance even when not used


 [ 
https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4318:
-

Status: Patch Available  (was: Open)

 OperatorHooks hit performance even when not used
 

 Key: HIVE-4318
 URL: https://issues.apache.org/jira/browse/HIVE-4318
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC (64 bit)
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.3.patch, 
 HIVE-4318.patch.pam.txt


 Operator Hooks inserted into Operator.java cause a performance hit even when 
 it is not being used.
 For a count(1) query tested with  without the operator hook calls.
 {code:title=with}
 2013-04-09 07:33:58,920 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 84.07 sec
 Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec
 OK
 28800991
 Time taken: 40.407 seconds, Fetched: 1 row(s)
 {code}
 {code:title=without}
 2013-04-09 07:33:02,355 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 68.48 sec
 ...
 Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec
 OK
 28800991
 Time taken: 35.907 seconds, Fetched: 1 row(s)
 {code}
 The effect is multiplied by the number of operators in the pipeline that has 
 to forward the row - the more operators there are the, the slower the query.
 The modification made to test this was 
 {code:title=Operator.java}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
 @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws 
 HiveException {
return;
  }
  OperatorHookContext opHookContext = new OperatorHookContext(this, row, 
 tag);
 -preProcessCounter();
 -enterOperatorHooks(opHookContext);
 +//preProcessCounter();
 +//enterOperatorHooks(opHookContext);
  processOp(row, tag);
 -exitOperatorHooks(opHookContext);
 -postProcessCounter();
 +//exitOperatorHooks(opHookContext);
 +//postProcessCounter();
}
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4282) Implement vectorized column-scalar expressions


 [ 
https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4282:
---

Attachment: HIVE-4282.1.patch

 Implement vectorized column-scalar expressions
 --

 Key: HIVE-4282
 URL: https://issues.apache.org/jira/browse/HIVE-4282
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4282.1.patch


 Implement arithmetic expressions involving a column and a scalar with column 
 as first argument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4282) Implement vectorized column-scalar expressions


[ 
https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635539#comment-13635539
 ] 

Jitendra Nath Pandey commented on HIVE-4282:


A patch is uploaded. All the files in the 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/ directory 
are files generated from a template.
The template files are in 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/.

The code to generated the files is in CodeGen.java. We plan to add an ant task 
to generate the files from the templates, which we will do in a follow up jira.

 Implement vectorized column-scalar expressions
 --

 Key: HIVE-4282
 URL: https://issues.apache.org/jira/browse/HIVE-4282
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4282.1.patch


 Implement arithmetic expressions involving a column and a scalar with column 
 as first argument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Install

2013-04-18 Thread shashwat shriparv

On Thu, Apr 18, 2013 at 11:59 PM, Suman Prabhala sumanprabh...@gmail.comwrote:

 lease suggest me in detail manner.


Read Work out learn :)

*Thanks  Regards*

∞
Shashwat Shriparv

Re: Review Request: New code for VectorizedRowBatch to form basis of vectorized query execution

2013-04-18 Thread Eric Hanson


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10592/
---

(Updated April 18, 2013, 7:41 p.m.)


Review request for hive.


Changes
---

Updated based on additional code review comments.


Description
---

New code for VectorizedRowBatch to form basis of vectorized query execution


This addresses bug HIVE-4284.
https://issues.apache.org/jira/browse/HIVE-4284


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DoubleColumnVector.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/LongColumnVector.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatch.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/10592/diff/


Testing
---


Thanks,

Eric Hanson

[jira] [Updated] (HIVE-4284) Implement class for vectorized row batch

2013-04-18 Thread Eric Hanson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4284:
--

Attachment: HIVE-4284.5.patch

modified patch with updates based on code review comments

 Implement class for vectorized row batch
 

 Key: HIVE-4284
 URL: https://issues.apache.org/jira/browse/HIVE-4284
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Eric Hanson
 Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch


 Vectorized row batch object will represent the row batch that vectorized 
 operators will work on. Refer to design spec attached to HIVE-4160 for 
 details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4284) Implement class for vectorized row batch

2013-04-18 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635591#comment-13635591
 ] 

Eric Hanson commented on HIVE-4284:
---

New diff available for review at https://reviews.apache.org/r/10592/

 Implement class for vectorized row batch
 

 Key: HIVE-4284
 URL: https://issues.apache.org/jira/browse/HIVE-4284
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Eric Hanson
 Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch


 Vectorized row batch object will represent the row batch that vectorized 
 operators will work on. Refer to design spec attached to HIVE-4160 for 
 details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4284) Implement class for vectorized row batch


[ 
https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635597#comment-13635597
 ] 

Carl Steinbach commented on HIVE-4284:
--

+1

 Implement class for vectorized row batch
 

 Key: HIVE-4284
 URL: https://issues.apache.org/jira/browse/HIVE-4284
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Eric Hanson
 Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch


 Vectorized row batch object will represent the row batch that vectorized 
 operators will work on. Refer to design spec attached to HIVE-4160 for 
 details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4350) support AS keyword for table alias

2013-04-18 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4350:


Assignee: Matthew Weaver

 support AS keyword for table alias
 --

 Key: HIVE-4350
 URL: https://issues.apache.org/jira/browse/HIVE-4350
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Thejas M Nair
Assignee: Matthew Weaver

 SQL standard supports AS optional keyword, while creating an table alias.
 http://savage.net.au/SQL/sql-92.bnf.html#table reference
 Hive gives a error when the optional keyword is used -
 select * from tiny as t1;
 org.apache.hive.service.cli.HiveSQLException: Error while processing 
 statement: FAILED: ParseException line 1:19 mismatched input 'as' expecting 
 EOF near 'tiny'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3891) physical optimizer changes for auto sort-merge join


[ 
https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635660#comment-13635660
 ] 

Ashutosh Chauhan commented on HIVE-3891:


Left some comments on Phabricator.

 physical optimizer changes for auto sort-merge join
 ---

 Key: HIVE-3891
 URL: https://issues.apache.org/jira/browse/HIVE-3891
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: auto_sortmerge_join_1.q, auto_sortmerge_join_1.q.out, 
 hive.3891.10.patch, hive.3891.11.patch, hive.3891.12.patch, 
 hive.3891.13.patch, hive.3891.14.patch, hive.3891.1.patch, hive.3891.2.patch, 
 hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, 
 hive.3891.7.patch, HIVE-3891_8.patch, hive.3891.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4282) Implement vectorized column-scalar expressions


[ 
https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635669#comment-13635669
 ] 

Jitendra Nath Pandey commented on HIVE-4282:


The patch is up on review board.

https://reviews.apache.org/r/10608/

 Implement vectorized column-scalar expressions
 --

 Key: HIVE-4282
 URL: https://issues.apache.org/jira/browse/HIVE-4282
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4282.1.patch


 Implement arithmetic expressions involving a column and a scalar with column 
 as first argument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3620) Drop table using hive CLI throws error when the total number of partition in the table is around 50K.

2013-04-18 Thread Thiruvel Thirumoolan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635676#comment-13635676
]

Thiruvel Thirumoolan commented on HIVE-3620:

[~sho.shimauchi] Did you have any special parameters for datanucleus to get
this working? I tried disabling datanucleus cache and also set connection
pools, but that does not seem to help. Will also post a snapshot of memory dump
I have. BTW, I tried dropping a table with 45k partitions with the batch size
configured to 100 and 1000.

Drop table using hive CLI throws error when the total number of partition in
the table is around 50K.
-

Key: HIVE-3620
URL: https://issues.apache.org/jira/browse/HIVE-3620
Project: Hive
Issue Type: Bug
Reporter: Arup Malakar

hive drop table load_test_table_2_0;

FAILED: Error in metadata: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timedout

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
The DB used is Oracle and hive had only one table:
select COUNT(*) from PARTITIONS;
54839
I can try and play around with the parameter
hive.metastore.client.socket.timeout if that is what is being used. But it is
200 seconds as of now, and 200 seconds for a drop table calls seems high
already.
Thanks,
Arup

[jira] [Updated] (HIVE-3620) Drop table using hive CLI throws error when the total number of partition in the table is around 50K.

2013-04-18 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-3620:
---

Attachment: Hive-3620_HeapDump.jpg

 Drop table using hive CLI throws error when the total number of partition in 
 the table is around 50K.
 -

 Key: HIVE-3620
 URL: https://issues.apache.org/jira/browse/HIVE-3620
 Project: Hive
  Issue Type: Bug
Reporter: Arup Malakar
 Attachments: Hive-3620_HeapDump.jpg


 hive drop table load_test_table_2_0; 
  
 FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: 
 java.net.SocketTimeoutException: Read timedout
   
   
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask 
 The DB used is Oracle and hive had only one table:
 select COUNT(*) from PARTITIONS;
 54839
 I can try and play around with the parameter 
 hive.metastore.client.socket.timeout if that is what is being used. But it is 
 200 seconds as of now, and 200 seconds for a drop table calls seems high 
 already.
 Thanks,
 Arup

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4278) HCat needs to get current Hive jars instead of pulling them from maven repo

2013-04-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635700#comment-13635700
 ] 

Hudson commented on HIVE-4278:
--

Integrated in Hive-trunk-hadoop2 #165 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/165/])
HIVE-4278 : HCat needs to get current Hive jars instead of pulling them 
from maven repo (Sushanth Sowmyan via Ashutosh Chauhan) (Revision 1469348)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1469348
Files : 
* /hive/trunk/beeline/ivy.xml
* /hive/trunk/build-common.xml
* /hive/trunk/build.properties
* /hive/trunk/cli/ivy.xml
* /hive/trunk/hcatalog/build-support/ant/deploy.xml
* /hive/trunk/hcatalog/build.properties
* /hive/trunk/hcatalog/core/pom.xml
* /hive/trunk/hcatalog/hcatalog-pig-adapter/pom.xml
* /hive/trunk/hcatalog/pom.xml
* /hive/trunk/hcatalog/server-extensions/pom.xml
* /hive/trunk/hcatalog/storage-handlers/hbase/pom.xml
* /hive/trunk/hcatalog/webhcat/java-client/pom.xml
* /hive/trunk/hcatalog/webhcat/svr/pom.xml
* /hive/trunk/hwi/ivy.xml
* /hive/trunk/ql/build.xml
* /hive/trunk/ql/ivy.xml


 HCat needs to get current Hive jars instead of pulling them from maven repo
 ---

 Key: HIVE-4278
 URL: https://issues.apache.org/jira/browse/HIVE-4278
 Project: Hive
  Issue Type: Sub-task
  Components: Build Infrastructure, HCatalog
Affects Versions: 0.11.0
Reporter: Alan Gates
Assignee: Sushanth Sowmyan
Priority: Blocker
 Fix For: 0.11.0

 Attachments: HIVE-4278.approach2.patch, 
 HIVE-4278.approach2.patch.2.for.branch.11, 
 HIVE-4278.approach2.patch.2.for.branch.12, 
 HIVE-4278.approach2.patch.3.for.branch.12, HIVE-4278.D10257.1.patch, 
 HIVE-4278.D9981.1.patch


 The HCatalog build is currently pulling Hive jars from the maven repo instead 
 of using the ones built as part of the current build.  Now that it is part of 
 Hive it should use the jars being built instead of pulling them from maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3620) Drop table using hive CLI throws error when the total number of partition in the table is around 50K.

2013-04-18 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-3620:
---

Attachment: HIVE-3620 Heapdump detail.png

Just to add even more detail, this leak-report indicates Datanucleus's 
ConnectionFactoryImpl seems to retain a majority of the memory being leaked 
(440 MB, in this case).

 Drop table using hive CLI throws error when the total number of partition in 
 the table is around 50K.
 -

 Key: HIVE-3620
 URL: https://issues.apache.org/jira/browse/HIVE-3620
 Project: Hive
  Issue Type: Bug
Reporter: Arup Malakar
 Attachments: HIVE-3620 Heapdump detail.png, Hive-3620_HeapDump.jpg


 hive drop table load_test_table_2_0; 
  
 FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: 
 java.net.SocketTimeoutException: Read timedout
   
   
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask 
 The DB used is Oracle and hive had only one table:
 select COUNT(*) from PARTITIONS;
 54839
 I can try and play around with the parameter 
 hive.metastore.client.socket.timeout if that is what is being used. But it is 
 200 seconds as of now, and 200 seconds for a drop table calls seems high 
 already.
 Thanks,
 Arup

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4200) Consolidate submodule dependencies using ivy inheritance


 [ 
https://issues.apache.org/jira/browse/HIVE-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4200:
-

Attachment: HIVE-4200.2.patch

Updated version of the patch that fixes offline mode. I verified that 'ant 
clean package -Doffline=true' works with the network cable pulled out. The 
downside is that I had to disable the HCatalog build since they're still doing 
there own thing.

 Consolidate submodule dependencies using ivy inheritance
 

 Key: HIVE-4200
 URL: https://issues.apache.org/jira/browse/HIVE-4200
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4200.1.patch.txt, HIVE-4200.2.patch


 As discussed in 4187:
 For easier maintenance of ivy dependencies across submodules: Create parent 
 ivy file with consolidated dependencies and include into submodules via 
 inheritance. This way we're not relying on transitive dependencies, but also 
 have the dependencies in a single place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution

[
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635734#comment-13635734
]

Carl Steinbach commented on HIVE-4305:
--

bq. Does ivy have a completely offline mode? That is what I am most interested
in and haven't been able to find it. For example, ivy.cache.ttl.default=eternal
doesn't stop the downloading.

[~brocknoland] I have good news and bad news.

The good news is that Ivy supports completely offline builds via the resolver's
useCacheOnly property. I updated my patch for HIVE-4200 with these changes and
verified that offline builds work with the network cable pulled out.

The bad news is that the HCatalog build is still doing its own thing and
doesn't respect the offline flag, so to make this work I had to remove hcatalog
from the submodule lists in build.properties. I plan to fix this over the
weekend by switching hcatalog over to Ivy.

Use a single system for dependency resolution
-

Key: HIVE-4305
URL: https://issues.apache.org/jira/browse/HIVE-4305
Project: Hive
Issue Type: Improvement
Components: Build Infrastructure, HCatalog
Reporter: Travis Crawford

Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy
for dependency resolution while HCatalog uses maven-ant-tasks. With the
project merge we should converge on a single tool for dependency resolution.

[jira] [Comment Edited] (HIVE-4200) Consolidate submodule dependencies using ivy inheritance


[ 
https://issues.apache.org/jira/browse/HIVE-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635723#comment-13635723
 ] 

Carl Steinbach edited comment on HIVE-4200 at 4/18/13 9:48 PM:
---

Updated version of the patch that fixes offline mode. I verified that 'ant 
clean package -Doffline=true' works with the network cable pulled out. The 
downside is that I had to disable the HCatalog build since they're still doing 
their own thing.

  was (Author: cwsteinbach):
Updated version of the patch that fixes offline mode. I verified that 'ant 
clean package -Doffline=true' works with the network cable pulled out. The 
downside is that I had to disable the HCatalog build since they're still doing 
there own thing.
  
 Consolidate submodule dependencies using ivy inheritance
 

 Key: HIVE-4200
 URL: https://issues.apache.org/jira/browse/HIVE-4200
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4200.1.patch.txt, HIVE-4200.2.patch


 As discussed in 4187:
 For easier maintenance of ivy dependencies across submodules: Create parent 
 ivy file with consolidated dependencies and include into submodules via 
 inheritance. This way we're not relying on transitive dependencies, but also 
 have the dependencies in a single place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4284) Implement class for vectorized row batch


 [ 
https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4284.


Resolution: Fixed

Committed to vectorization branch. Thanks, Eric!
Thanks Jitendra and Carl for reviewing!

 Implement class for vectorized row batch
 

 Key: HIVE-4284
 URL: https://issues.apache.org/jira/browse/HIVE-4284
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Eric Hanson
 Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch


 Vectorized row batch object will represent the row batch that vectorized 
 operators will work on. Refer to design spec attached to HIVE-4160 for 
 details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop


[ 
https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635782#comment-13635782
 ] 

Gunther Hagleitner commented on HIVE-4103:
--

I took some time to test out the two versions of the code. I ran a number of 
mapjoins ranging from small to at the limit and finally over the limit. In 
summary: Without the gc calls we overestimate the used memory very slightly. 
The biggest one I've seen is ~1%. The errors btw always cause the estimates to 
be more conservative, never less. The performance benefit on the other hand is 
quite substantial: On that large run it went from 120s to 56s with Gopals 
patch. I think we should move forward with this.

Largest run:

With Patch:

{noformat}
2013-04-18 05:29:36 Starting to launch local task to process map join;  
maximum memory = 1065484288
2013-04-18 05:29:42 Processing rows:20  Hashtable size: 19  
Memory usage:   108807528   rate:   0.102
2013-04-18 05:29:44 Processing rows:30  Hashtable size: 29  
Memory usage:   158575416   rate:   0.149
2013-04-18 05:29:46 Processing rows:40  Hashtable size: 39  
Memory usage:   211033848   rate:   0.198
2013-04-18 05:29:48 Processing rows:50  Hashtable size: 49  
Memory usage:   260673400   rate:   0.245
2013-04-18 05:29:50 Processing rows:60  Hashtable size: 59  
Memory usage:   310156256   rate:   0.291
2013-04-18 05:29:53 Processing rows:70  Hashtable size: 69  
Memory usage:   359750536   rate:   0.338
2013-04-18 05:29:54 Processing rows:80  Hashtable size: 79  
Memory usage:   417989768   rate:   0.392
2013-04-18 05:29:57 Processing rows:90  Hashtable size: 89  
Memory usage:   460568536   rate:   0.432
2013-04-18 05:29:58 Processing rows:100 Hashtable size: 99  
Memory usage:   510475320   rate:   0.479
2013-04-18 05:30:01 Processing rows:110 Hashtable size: 109 
Memory usage:   559513584   rate:   0.525
2013-04-18 05:30:03 Processing rows:120 Hashtable size: 119 
Memory usage:   609277088   rate:   0.572
2013-04-18 05:30:06 Processing rows:130 Hashtable size: 129 
Memory usage:   659366968   rate:   0.619
2013-04-18 05:30:07 Processing rows:140 Hashtable size: 139 
Memory usage:   708744832   rate:   0.665
2013-04-18 05:30:08 Processing rows:150 Hashtable size: 149 
Memory usage:   758335688   rate:   0.712
2013-04-18 05:30:13 Processing rows:160 Hashtable size: 159 
Memory usage:   825625224   rate:   0.775
2013-04-18 05:30:14 Processing rows:1646400 Hashtable size: 1646400 
Memory usage:   848652056   rate:   0.796
2013-04-18 05:30:14 Dump the hashtable into file: 
file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable
2013-04-18 05:30:32 Upload 1 File to: 
file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable
 File size: 127593266
2013-04-18 05:30:32 End of local task; Time Taken: 56.264 sec.
{noformat}

Without patch:

{noformat}
2013-04-18 05:55:22 Starting to launch local task to process map join;  
maximum memory = 1065484288
2013-04-18 05:55:29 Processing rows:20  Hashtable size: 19  
Memory usage:   108779608   rate:   0.102
2013-04-18 05:55:33 Processing rows:30  Hashtable size: 29  
Memory usage:   157203744   rate:   0.148
2013-04-18 05:55:37 Processing rows:40  Hashtable size: 39  
Memory usage:   208667552   rate:   0.196
2013-04-18 05:55:42 Processing rows:50  Hashtable size: 49  
Memory usage:   258126352   rate:   0.242
2013-04-18 05:55:46 Processing rows:60  Hashtable size: 59  
Memory usage:   307734104   rate:   0.289
2013-04-18 05:55:51 Processing rows:70  Hashtable size: 69  
Memory usage:   357043768   rate:   0.335
2013-04-18 05:55:57 Processing rows:80  Hashtable size: 79  
Memory usage:   415059928   rate:   0.39
2013-04-18 05:56:04 Processing rows:90  Hashtable size: 89  
Memory usage:   460135344   rate:   0.432
2013-04-18 05:56:10 Processing rows:100 Hashtable size: 99  
Memory usage:   509690176   rate:   0.478
2013-04-18 05:56:18 Processing rows:110 Hashtable size: 109 
Memory usage:   559042448   rate:   0.525
2013-04-18 05:56:25 Processing rows:120 Hashtable size: 119 
Memory usage:   608652728   rate:   0.571
2013-04-18 05:56:33 Processing rows:130 Hashtable

[jira] [Assigned] (HIVE-4305) Use a single system for dependency resolution


 [ 
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-4305:


Assignee: Carl Steinbach

 Use a single system for dependency resolution
 -

 Key: HIVE-4305
 URL: https://issues.apache.org/jira/browse/HIVE-4305
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure, HCatalog
Reporter: Travis Crawford
Assignee: Carl Steinbach

 Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy 
 for dependency resolution while HCatalog uses maven-ant-tasks. With the 
 project merge we should converge on a single tool for dependency resolution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4266) Refactor HCatalog code to org.apache.hive.hcatalog


 [ 
https://issues.apache.org/jira/browse/HIVE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4266:
-

Priority: Blocker  (was: Major)

 Refactor HCatalog code to org.apache.hive.hcatalog
 --

 Key: HIVE-4266
 URL: https://issues.apache.org/jira/browse/HIVE-4266
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.11.0


 Currently HCatalog code is in packages org.apache.hcatalog.  It needs to now 
 move to org.apache.hive.hcatalog.  Shell classes/interface need to be created 
 for public facing classes so that user's code does not break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2055) Hive HBase Integration issue

2013-04-18 Thread Giridharan Kesavan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HIVE-2055:
-

Attachment: HIVE-2055.patch

HIVE-2055.patch fixes bin/hive script to include hbase and hcat lib's into the 
classpath

 Hive HBase Integration issue
 

 Key: HIVE-2055
 URL: https://issues.apache.org/jira/browse/HIVE-2055
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: sajith v
 Attachments: HIVE-2055.patch


 Created an external table in hive , which points to the HBase table. When 
 tried to query a column using the column name in select clause got the 
 following exception : ( java.lang.ClassNotFoundException: 
 org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat), errorCode:12, 
 SQLState:42000)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4266) Refactor HCatalog code to org.apache.hive.hcatalog


[ 
https://issues.apache.org/jira/browse/HIVE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635805#comment-13635805
 ] 

Carl Steinbach commented on HIVE-4266:
--

Marking this a blocker for 0.11.0.

bq. Shell classes/interface need to be created for public facing classes so 
that user's code does not break.

I think this will create more problems than it fixes. Permanently supporting 
these shell classes will be a long term maintenance burden and headache for all 
involved. The other option is to add them temporarily, but what does that 
really accomplish?

I think for most folks upgrading to the new namespace should be as simple as 
running this command on their source tree:

{noformat}
% perl -p -i.bak -e 's|org\.apache\.hcatalog|org.apache.hive.hcatalog|g' `find 
. -name *.java`
{noformat}

 Refactor HCatalog code to org.apache.hive.hcatalog
 --

 Key: HIVE-4266
 URL: https://issues.apache.org/jira/browse/HIVE-4266
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.11.0


 Currently HCatalog code is in packages org.apache.hcatalog.  It needs to now 
 move to org.apache.hive.hcatalog.  Shell classes/interface need to be created 
 for public facing classes so that user's code does not break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4284) Implement class for vectorized row batch


 [ 
https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4284:
-

Labels: VectorEngine  (was: )

 Implement class for vectorized row batch
 

 Key: HIVE-4284
 URL: https://issues.apache.org/jira/browse/HIVE-4284
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Eric Hanson
  Labels: VectorEngine
 Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch


 Vectorized row batch object will represent the row batch that vectorized 
 operators will work on. Refer to design spec attached to HIVE-4160 for 
 details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4357) BeeLine tests are not getting executed

2013-04-18 Thread Rob Weltman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635816#comment-13635816
 ] 

Rob Weltman commented on HIVE-4357:
---

Test udf7 in ql (probably one of the last ones in ql) fails for me because of 
conflicting junit versions in the CLASSPATH. I had to do a little trickery to 
get past that but could then verify that the beeline tests are executed and 
pass.


 BeeLine tests are not getting executed
 --

 Key: HIVE-4357
 URL: https://issues.apache.org/jira/browse/HIVE-4357
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Carl Steinbach
Assignee: Rob Weltman
 Fix For: 0.11.0

 Attachments: HIVE-4357.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4357) BeeLine tests are not getting executed


[ 
https://issues.apache.org/jira/browse/HIVE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635818#comment-13635818
 ] 

Carl Steinbach commented on HIVE-4357:
--

+1. Will commit if tests pass.

 BeeLine tests are not getting executed
 --

 Key: HIVE-4357
 URL: https://issues.apache.org/jira/browse/HIVE-4357
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Carl Steinbach
Assignee: Rob Weltman
 Fix For: 0.11.0

 Attachments: HIVE-4357.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4266) Refactor HCatalog code to org.apache.hive.hcatalog

2013-04-18 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635827#comment-13635827
 ] 

Alan Gates commented on HIVE-4266:
--

We cannot make this kind of backwards incompatible change for users.  Users 
will not see this as here, run this script against your source tree.  They'll 
see it as they have to go modify, re-test, and re-deploy every application.

We should not make this a blocker for 0.11.  I'm 90% of the way through the 
patch, but it will take a fair amount of testing when I'm done to asure that it 
works with both org.apache.hcatalog and org.apache.hive.hcatalog.



 Refactor HCatalog code to org.apache.hive.hcatalog
 --

 Key: HIVE-4266
 URL: https://issues.apache.org/jira/browse/HIVE-4266
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.11.0


 Currently HCatalog code is in packages org.apache.hcatalog.  It needs to now 
 move to org.apache.hive.hcatalog.  Shell classes/interface need to be created 
 for public facing classes so that user's code does not break.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution


[ 
https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635831#comment-13635831
 ] 

Carl Steinbach commented on HIVE-4305:
--

Owen, I think you just won the debate! In Ant you have to type 
-Doffline=true. I tried figuring out how many extra characters that is but 
kept losing count.

 Use a single system for dependency resolution
 -

 Key: HIVE-4305
 URL: https://issues.apache.org/jira/browse/HIVE-4305
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure, HCatalog
Reporter: Travis Crawford
Assignee: Carl Steinbach

 Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy 
 for dependency resolution while HCatalog uses maven-ant-tasks. With the 
 project merge we should converge on a single tool for dependency resolution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4200) Consolidate submodule dependencies using ivy inheritance


[ 
https://issues.apache.org/jira/browse/HIVE-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635833#comment-13635833
 ] 

Gunther Hagleitner commented on HIVE-4200:
--

Thanks [~cwsteinbach]. Just saw this. I'll take a look tonight. Feel free to 
take over the jira, if it seems I am becoming the bottleneck on this.

 Consolidate submodule dependencies using ivy inheritance
 

 Key: HIVE-4200
 URL: https://issues.apache.org/jira/browse/HIVE-4200
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4200.1.patch.txt, HIVE-4200.2.patch


 As discussed in 4187:
 For easier maintenance of ivy dependencies across submodules: Create parent 
 ivy file with consolidated dependencies and include into submodules via 
 inheritance. This way we're not relying on transitive dependencies, but also 
 have the dependencies in a single place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4095) Add exchange partition in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635852#comment-13635852
 ] 

Dheeraj Kumar Singh commented on HIVE-4095:
---

HIVE-4095.part12.patch.txt and HIVE-4095.part11.patch.txt are the two relevant 
patches. HIVE-4095.part11.patch.txt has the diff that was put up and 
HIVE-4095.part12.patch.txt has the thrift changes.

 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, 
 HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, 
 HIVE-4095.part12.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4282) Implement vectorized column-scalar expressions


 [ 
https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4282:
---

Attachment: HIVE-4282.2.patch

Uploaded a new patch fixing style issues.
Also uploaded on the review board. https://reviews.apache.org/r/10608/

 Implement vectorized column-scalar expressions
 --

 Key: HIVE-4282
 URL: https://issues.apache.org/jira/browse/HIVE-4282
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4282.1.patch, HIVE-4282.2.patch


 Implement arithmetic expressions involving a column and a scalar with column 
 as first argument.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used


[ 
https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635922#comment-13635922
 ] 

Ashutosh Chauhan commented on HIVE-4318:


+1 will commit if tests pass.

 OperatorHooks hit performance even when not used
 

 Key: HIVE-4318
 URL: https://issues.apache.org/jira/browse/HIVE-4318
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC (64 bit)
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.3.patch, 
 HIVE-4318.patch.pam.txt


 Operator Hooks inserted into Operator.java cause a performance hit even when 
 it is not being used.
 For a count(1) query tested with  without the operator hook calls.
 {code:title=with}
 2013-04-09 07:33:58,920 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 84.07 sec
 Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec
 OK
 28800991
 Time taken: 40.407 seconds, Fetched: 1 row(s)
 {code}
 {code:title=without}
 2013-04-09 07:33:02,355 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
 68.48 sec
 ...
 Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec
 OK
 28800991
 Time taken: 35.907 seconds, Fetched: 1 row(s)
 {code}
 The effect is multiplied by the number of operators in the pipeline that has 
 to forward the row - the more operators there are the, the slower the query.
 The modification made to test this was 
 {code:title=Operator.java}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
 @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws 
 HiveException {
return;
  }
  OperatorHookContext opHookContext = new OperatorHookContext(this, row, 
 tag);
 -preProcessCounter();
 -enterOperatorHooks(opHookContext);
 +//preProcessCounter();
 +//enterOperatorHooks(opHookContext);
  processOp(row, tag);
 -exitOperatorHooks(opHookContext);
 -postProcessCounter();
 +//exitOperatorHooks(opHookContext);
 +//postProcessCounter();
}
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4371) some issue with merging join trees


[ 
https://issues.apache.org/jira/browse/HIVE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635934#comment-13635934
 ] 

Navis commented on HIVE-4371:
-

[~namit] Ran all tests and passed. I cannot see anything wrong in aliases of 
QBJoinTree. 

{noformat}
TS1(b)-RS1\
TS2(c)-RS2-JOIN1-RS4\
TS3(a)-RS3/ JOIN2
  TS4(d)-RS5/

JOIN2 (L=null, R=d, Ls=[a,b,c], Base=d) 
{noformat}

In this, posBig should be 0(d) or 1(null), not 2(c) in other join context.

 some issue with merging join trees
 --

 Key: HIVE-4371
 URL: https://issues.apache.org/jira/browse/HIVE-4371
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Namit Jain
Assignee: Navis
 Attachments: HIVE-4371.D10323.1.patch


 [~navis], I would really appreciate if you can take a look.
 I am attaching a testcase, for which in the optimizer the join context left
 aliases and right aliases do not look correct.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4095) Add exchange partition in Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4095:
-

Attachment: hive.4095.1.patch

 Add exchange partition in Hive
 --

 Key: HIVE-4095
 URL: https://issues.apache.org/jira/browse/HIVE-4095
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Dheeraj Kumar Singh
 Attachments: hive.4095.1.patch, HIVE-4095.D10155.1.patch, 
 HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch, 
 HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2817) Drop any table even without privilege


 [ 
https://issues.apache.org/jira/browse/HIVE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2817:
--

Attachment: HIVE-2817.D10371.1.patch

chenchun requested code review of HIVE-2817 [jira] Drop any table even without 
privilege.

Reviewers: JIRA

HIVE-2817 Drop any table even without privilege

You can drop any table if you use fully qualified name 'database.table' even 
you don't have any previlige.

hive set hive.security.authorization.enabled=true;
hive revoke all on default from user test_user;
hive drop table abc;
hive drop table abc;
Authorization failed:No privilege 'Drop' found for outputs { database:default, 
table:abc}. Use show grant to get more details.
hive drop table default.abc;
OK
Time taken: 0.13 seconds

The table and the file in /usr/hive/warehouse or external file will be deleted. 
If you don't have hadoop access permission on /usr/hive/warehouse or external 
files, you will see a hadoop access error

12/02/23 15:35:35 ERROR hive.log: 
org.apache.hadoop.security.AccessControlException: 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=test_user, access=WRITE, inode=/user/myetl:myetl:etl:drwxr-xr-x
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D10371

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
  ql/src/test/queries/clientnegative/authorization_fail_8.q
  ql/src/test/queries/clientpositive/authorization_8.q
  ql/src/test/results/clientnegative/authorization_fail_8.q.out
  ql/src/test/results/clientpositive/authorization_8.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/24831/

To: JIRA, chenchun


 Drop any table even without privilege
 -

 Key: HIVE-2817
 URL: https://issues.apache.org/jira/browse/HIVE-2817
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Benyi Wang
 Attachments: HIVE-2817.D10371.1.patch


 You can drop any table if you use fully qualified name 'database.table' even 
 you don't have any previlige.
 {code}
 hive set hive.security.authorization.enabled=true;
 hive revoke all on default from user test_user;
 hive drop table abc;
 hive drop table abc;
 Authorization failed:No privilege 'Drop' found for outputs { 
 database:default, table:abc}. Use show grant to get more details.
 hive drop table default.abc;
 OK
 Time taken: 0.13 seconds
 {code}
 The table and the file in {{/usr/hive/warehouse}} or external file will be 
 deleted. If you don't have hadoop access permission on 
 {{/usr/hive/warehouse}} or external files, you will see a hadoop access error
 {code}
 12/02/23 15:35:35 ERROR hive.log: 
 org.apache.hadoop.security.AccessControlException: 
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=test_user, access=WRITE, inode=/user/myetl:myetl:etl:drwxr-xr-x
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2817) Drop any table even without privilege

2013-04-18 Thread Chen Chun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Chun reassigned HIVE-2817:
---

Assignee: Chen Chun

 Drop any table even without privilege
 -

 Key: HIVE-2817
 URL: https://issues.apache.org/jira/browse/HIVE-2817
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Benyi Wang
Assignee: Chen Chun
 Attachments: HIVE-2817.D10371.1.patch


 You can drop any table if you use fully qualified name 'database.table' even 
 you don't have any previlige.
 {code}
 hive set hive.security.authorization.enabled=true;
 hive revoke all on default from user test_user;
 hive drop table abc;
 hive drop table abc;
 Authorization failed:No privilege 'Drop' found for outputs { 
 database:default, table:abc}. Use show grant to get more details.
 hive drop table default.abc;
 OK
 Time taken: 0.13 seconds
 {code}
 The table and the file in {{/usr/hive/warehouse}} or external file will be 
 deleted. If you don't have hadoop access permission on 
 {{/usr/hive/warehouse}} or external files, you will see a hadoop access error
 {code}
 12/02/23 15:35:35 ERROR hive.log: 
 org.apache.hadoop.security.AccessControlException: 
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=test_user, access=WRITE, inode=/user/myetl:myetl:etl:drwxr-xr-x
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby


[ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636073#comment-13636073
 ] 

Phabricator commented on HIVE-2340:
---

navis has commented on the revision HIVE-2340 [jira] optimize orderby followed 
by a groupby.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:122
 When ScriptOperator exists between RSs, it might possible to dedup only if the 
script does not change schema, order of rows and values of the RS related 
columns. It seemed added for that case by He Yongqiang, initial developer of 
this optimizer.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:103
 Added comments
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:359
 ok.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:181
 done.

REVISION DETAIL
  https://reviews.facebook.net/D1209

BRANCH
  DPAL-592

ARCANIST PROJECT
  hive

To: JIRA, hagleitn, navis
Cc: hagleitn, njain


 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Fix For: 0.11.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
 HIVE-2340.13.patch, HIVE-2340.14.patch, 
 HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, 
 HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
 HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, 
 HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
 HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)


 [ 
https://issues.apache.org/jira/browse/HIVE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4377:
--

Attachment: HIVE-4377.D10377.1.patch

navis requested code review of HIVE-4377 [jira] Add more comment to 
https://reviews.facebook.net/D1209 (HIVE-2340).

Reviewers: JIRA

HIVE-4377 Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)

thanks a lot for addressing optimization in HIVE-2340. Awesome!

Since we are developing at a very fast pace, it would be really useful to
think about maintainability and testing of the large codebase. Highlights which 
are applicable for D1209:

  1.  Javadoc for all public/private functions, except for
setters/getters. For any complex function, clear examples (input/output)
would really help.
  2.  Specially, for query optimizations, it might be a good idea to have
a simple working query at the top, and the expected changes. For e.g..
The operator tree for that query at each step, or a detailed explanation
at the top.
  3.  If possible, the test name (.q file) where the function is being
invoked, or the query which would potentially test that scenario, if it
is a query processor change.
  4.  Comments in each test (.q file) that should include the jira
number,  what is it trying to test. Assumptions about each query.
  5.  Reduce the output for each test  whenever query is outputting more
than 10 results, it should have a reason. Otherwise, each query result
should be bounded by 10 rows.

thanks a lot

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D10377

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/24849/

To: JIRA, navis


 Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
 --

 Key: HIVE-4377
 URL: https://issues.apache.org/jira/browse/HIVE-4377
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Navis
 Attachments: HIVE-4377.D10377.1.patch


 thanks a lot for addressing optimization in HIVE-2340. Awesome!
 Since we are developing at a very fast pace, it would be really useful to
 think about maintainability and testing of the large codebase. Highlights 
 which are applicable for D1209:
   1.  Javadoc for all public/private functions, except for
 setters/getters. For any complex function, clear examples (input/output)
 would really help.
   2.  Specially, for query optimizations, it might be a good idea to have
 a simple working query at the top, and the expected changes. For e.g..
 The operator tree for that query at each step, or a detailed explanation
 at the top.
   3.  If possible, the test name (.q file) where the function is being
 invoked, or the query which would potentially test that scenario, if it
 is a query processor change.
   4.  Comments in each test (.q file) that should include the jira
 number,  what is it trying to test. Assumptions about each query.
   5.  Reduce the output for each test  whenever query is outputting more
 than 10 results, it should have a reason. Otherwise, each query result
 should be bounded by 10 rows.
 thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)


 [ 
https://issues.apache.org/jira/browse/HIVE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4377:


Status: Patch Available  (was: Open)

Initial comments for review

 Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
 --

 Key: HIVE-4377
 URL: https://issues.apache.org/jira/browse/HIVE-4377
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Navis
 Attachments: HIVE-4377.D10377.1.patch


 thanks a lot for addressing optimization in HIVE-2340. Awesome!
 Since we are developing at a very fast pace, it would be really useful to
 think about maintainability and testing of the large codebase. Highlights 
 which are applicable for D1209:
   1.  Javadoc for all public/private functions, except for
 setters/getters. For any complex function, clear examples (input/output)
 would really help.
   2.  Specially, for query optimizations, it might be a good idea to have
 a simple working query at the top, and the expected changes. For e.g..
 The operator tree for that query at each step, or a detailed explanation
 at the top.
   3.  If possible, the test name (.q file) where the function is being
 invoked, or the query which would potentially test that scenario, if it
 is a query processor change.
   4.  Comments in each test (.q file) that should include the jira
 number,  what is it trying to test. Assumptions about each query.
   5.  Reduce the output for each test  whenever query is outputting more
 than 10 results, it should have a reason. Otherwise, each query result
 should be bounded by 10 rows.
 thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4278) HCat needs to get current Hive jars instead of pulling them from maven repo

2013-04-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636103#comment-13636103
 ] 

Hudson commented on HIVE-4278:
--

Integrated in Hive-trunk-h0.21 #2070 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2070/])
HIVE-4278 : HCat needs to get current Hive jars instead of pulling them 
from maven repo (Sushanth Sowmyan via Ashutosh Chauhan) (Revision 1469348)

 Result = ABORTED
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1469348
Files : 
* /hive/trunk/beeline/ivy.xml
* /hive/trunk/build-common.xml
* /hive/trunk/build.properties
* /hive/trunk/cli/ivy.xml
* /hive/trunk/hcatalog/build-support/ant/deploy.xml
* /hive/trunk/hcatalog/build.properties
* /hive/trunk/hcatalog/core/pom.xml
* /hive/trunk/hcatalog/hcatalog-pig-adapter/pom.xml
* /hive/trunk/hcatalog/pom.xml
* /hive/trunk/hcatalog/server-extensions/pom.xml
* /hive/trunk/hcatalog/storage-handlers/hbase/pom.xml
* /hive/trunk/hcatalog/webhcat/java-client/pom.xml
* /hive/trunk/hcatalog/webhcat/svr/pom.xml
* /hive/trunk/hwi/ivy.xml
* /hive/trunk/ql/build.xml
* /hive/trunk/ql/ivy.xml


 HCat needs to get current Hive jars instead of pulling them from maven repo
 ---

 Key: HIVE-4278
 URL: https://issues.apache.org/jira/browse/HIVE-4278
 Project: Hive
  Issue Type: Sub-task
  Components: Build Infrastructure, HCatalog
Affects Versions: 0.11.0
Reporter: Alan Gates
Assignee: Sushanth Sowmyan
Priority: Blocker
 Fix For: 0.11.0

 Attachments: HIVE-4278.approach2.patch, 
 HIVE-4278.approach2.patch.2.for.branch.11, 
 HIVE-4278.approach2.patch.2.for.branch.12, 
 HIVE-4278.approach2.patch.3.for.branch.12, HIVE-4278.D10257.1.patch, 
 HIVE-4278.D9981.1.patch


 The HCatalog build is currently pulling Hive jars from the maven repo instead 
 of using the ones built as part of the current build.  Now that it is part of 
 Hive it should use the jars being built instead of pulling them from maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4342) NPE for query involving UNION ALL with nested JOIN and UNION ALL