[jira] [Assigned] (HIVE-4367) enhance TRUNCATE syntex to drop data of external table
[ https://issues.apache.org/jira/browse/HIVE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-4367: Assignee: Teddy Choi enhance TRUNCATE syntex to drop data of external table Key: HIVE-4367 URL: https://issues.apache.org/jira/browse/HIVE-4367 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: caofangkun Assignee: Teddy Choi Priority: Minor In my use case , sometimes I have to remove data of external tables to free up storage space of the cluster . So it's necessary for to enhance the syntax like TRUNCATE TABLE srcpart_truncate PARTITION (dt='201130412') FORCE; to remove data from EXTERNAL table. And I add a configuration property to enable remove data to Trash property namehive.truncate.skiptrash/name valuefalse/value description if true will remove data to trash, else false drop data immediately /description /property For example : hive (default) TRUNCATE TABLE external1 partition (ds='11'); FAILED: Error in semantic analysis: Cannot truncate non-managed table external1 hive (default) TRUNCATE TABLE external1 partition (ds='11') FORCE; [2013-04-16 17:15:52]: Compile Start [2013-04-16 17:15:52]: Compile End [2013-04-16 17:15:52]: OK [2013-04-16 17:15:52]: Time taken: 0.413 seconds hive (default) set hive.truncate.skiptrash; hive.truncate.skiptrash=false hive (default) set hive.truncate.skiptrash=true; hive (default) TRUNCATE TABLE external1 partition (ds='12') FORCE; [2013-04-16 17:16:21]: Compile Start [2013-04-16 17:16:21]: Compile End [2013-04-16 17:16:21]: OK [2013-04-16 17:16:21]: Time taken: 0.143 seconds hive (default) dfs -ls /user/test/.Trash/Current/; Found 1 items drwxr-xr-x -test supergroup 0 2013-04-16 17:06 /user/test/.Trash/Current/ds=11 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: enhance TRUNCATE syntex to drop data of external table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10600/ --- Review request for hive. Description --- https://issues.apache.org/jira/browse/HIVE-4367 This addresses bug HIVE-4367. https://issues.apache.org/jira/browse/HIVE-4367 Diffs - http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1468713 http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml.template 1468713 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1468713 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1468713 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 1468713 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 1468713 Diff: https://reviews.apache.org/r/10600/diff/ Testing --- create external table external1 (a int, b int) partitioned by (ds string); load data local inpath '../data/files/kv1.txt' into table external1 partition (ds='2008-04-08'); load data local inpath '../data/files/kv1.txt' into table external1 partition (ds='2008-04-09'); -- trucate EXTERNAL table TRUNCATE TABLE external1 PARTITION (ds='2008-04-08') FORCE; select * from external1 where ds='2008-04-08'; select * from external1 where ds='2008-04-09'; TRUNCATE TABLE external1 FORCE; select * from external1; Thanks, fangkun cao
[jira] [Updated] (HIVE-4367) enhance TRUNCATE syntex to drop data of external table
[ https://issues.apache.org/jira/browse/HIVE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caofangkun updated HIVE-4367: - Attachment: HIVE-4367-1.patch https://reviews.apache.org/r/10600/ enhance TRUNCATE syntex to drop data of external table Key: HIVE-4367 URL: https://issues.apache.org/jira/browse/HIVE-4367 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: caofangkun Assignee: Teddy Choi Priority: Minor Attachments: HIVE-4367-1.patch In my use case , sometimes I have to remove data of external tables to free up storage space of the cluster . So it's necessary for to enhance the syntax like TRUNCATE TABLE srcpart_truncate PARTITION (dt='201130412') FORCE; to remove data from EXTERNAL table. And I add a configuration property to enable remove data to Trash property namehive.truncate.skiptrash/name valuefalse/value description if true will remove data to trash, else false drop data immediately /description /property For example : hive (default) TRUNCATE TABLE external1 partition (ds='11'); FAILED: Error in semantic analysis: Cannot truncate non-managed table external1 hive (default) TRUNCATE TABLE external1 partition (ds='11') FORCE; [2013-04-16 17:15:52]: Compile Start [2013-04-16 17:15:52]: Compile End [2013-04-16 17:15:52]: OK [2013-04-16 17:15:52]: Time taken: 0.413 seconds hive (default) set hive.truncate.skiptrash; hive.truncate.skiptrash=false hive (default) set hive.truncate.skiptrash=true; hive (default) TRUNCATE TABLE external1 partition (ds='12') FORCE; [2013-04-16 17:16:21]: Compile Start [2013-04-16 17:16:21]: Compile End [2013-04-16 17:16:21]: OK [2013-04-16 17:16:21]: Time taken: 0.143 seconds hive (default) dfs -ls /user/test/.Trash/Current/; Found 1 items drwxr-xr-x -test supergroup 0 2013-04-16 17:06 /user/test/.Trash/Current/ds=11 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4367) enhance TRUNCATE syntex to drop data of external table
[ https://issues.apache.org/jira/browse/HIVE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-4367: Assignee: (was: Teddy Choi) This issue was unassigned, so I assigned this issue to me. Then [~caofangkun] uploaded a patch for it. I have no right to assign it to him, so I'll make it unassigned. Please assign it to him, other committers. I'll review and test this patch. enhance TRUNCATE syntex to drop data of external table Key: HIVE-4367 URL: https://issues.apache.org/jira/browse/HIVE-4367 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: caofangkun Priority: Minor Attachments: HIVE-4367-1.patch In my use case , sometimes I have to remove data of external tables to free up storage space of the cluster . So it's necessary for to enhance the syntax like TRUNCATE TABLE srcpart_truncate PARTITION (dt='201130412') FORCE; to remove data from EXTERNAL table. And I add a configuration property to enable remove data to Trash property namehive.truncate.skiptrash/name valuefalse/value description if true will remove data to trash, else false drop data immediately /description /property For example : hive (default) TRUNCATE TABLE external1 partition (ds='11'); FAILED: Error in semantic analysis: Cannot truncate non-managed table external1 hive (default) TRUNCATE TABLE external1 partition (ds='11') FORCE; [2013-04-16 17:15:52]: Compile Start [2013-04-16 17:15:52]: Compile End [2013-04-16 17:15:52]: OK [2013-04-16 17:15:52]: Time taken: 0.413 seconds hive (default) set hive.truncate.skiptrash; hive.truncate.skiptrash=false hive (default) set hive.truncate.skiptrash=true; hive (default) TRUNCATE TABLE external1 partition (ds='12') FORCE; [2013-04-16 17:16:21]: Compile Start [2013-04-16 17:16:21]: Compile End [2013-04-16 17:16:21]: OK [2013-04-16 17:16:21]: Time taken: 0.143 seconds hive (default) dfs -ls /user/test/.Trash/Current/; Found 1 items drwxr-xr-x -test supergroup 0 2013-04-16 17:06 /user/test/.Trash/Current/ds=11 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4367) enhance TRUNCATE syntex to drop data of external table
[ https://issues.apache.org/jira/browse/HIVE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634911#comment-13634911 ] caofangkun commented on HIVE-4367: -- Hi [~teddy.choi] ,Sorry for that I did not notice you have assigned this issue when I upload the patch. I'm not a committer yet,so please feel free and assign this issue. Thank you. enhance TRUNCATE syntex to drop data of external table Key: HIVE-4367 URL: https://issues.apache.org/jira/browse/HIVE-4367 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: caofangkun Priority: Minor Attachments: HIVE-4367-1.patch In my use case , sometimes I have to remove data of external tables to free up storage space of the cluster . So it's necessary for to enhance the syntax like TRUNCATE TABLE srcpart_truncate PARTITION (dt='201130412') FORCE; to remove data from EXTERNAL table. And I add a configuration property to enable remove data to Trash property namehive.truncate.skiptrash/name valuefalse/value description if true will remove data to trash, else false drop data immediately /description /property For example : hive (default) TRUNCATE TABLE external1 partition (ds='11'); FAILED: Error in semantic analysis: Cannot truncate non-managed table external1 hive (default) TRUNCATE TABLE external1 partition (ds='11') FORCE; [2013-04-16 17:15:52]: Compile Start [2013-04-16 17:15:52]: Compile End [2013-04-16 17:15:52]: OK [2013-04-16 17:15:52]: Time taken: 0.413 seconds hive (default) set hive.truncate.skiptrash; hive.truncate.skiptrash=false hive (default) set hive.truncate.skiptrash=true; hive (default) TRUNCATE TABLE external1 partition (ds='12') FORCE; [2013-04-16 17:16:21]: Compile Start [2013-04-16 17:16:21]: Compile End [2013-04-16 17:16:21]: OK [2013-04-16 17:16:21]: Time taken: 0.143 seconds hive (default) dfs -ls /user/test/.Trash/Current/; Found 1 items drwxr-xr-x -test supergroup 0 2013-04-16 17:06 /user/test/.Trash/Current/ds=11 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4367) enhance TRUNCATE syntex to drop data of external table
[ https://issues.apache.org/jira/browse/HIVE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634914#comment-13634914 ] Teddy Choi commented on HIVE-4367: -- It's okay, [~caofangkun]. And thank you for your patch. :) enhance TRUNCATE syntex to drop data of external table Key: HIVE-4367 URL: https://issues.apache.org/jira/browse/HIVE-4367 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: caofangkun Priority: Minor Attachments: HIVE-4367-1.patch In my use case , sometimes I have to remove data of external tables to free up storage space of the cluster . So it's necessary for to enhance the syntax like TRUNCATE TABLE srcpart_truncate PARTITION (dt='201130412') FORCE; to remove data from EXTERNAL table. And I add a configuration property to enable remove data to Trash property namehive.truncate.skiptrash/name valuefalse/value description if true will remove data to trash, else false drop data immediately /description /property For example : hive (default) TRUNCATE TABLE external1 partition (ds='11'); FAILED: Error in semantic analysis: Cannot truncate non-managed table external1 hive (default) TRUNCATE TABLE external1 partition (ds='11') FORCE; [2013-04-16 17:15:52]: Compile Start [2013-04-16 17:15:52]: Compile End [2013-04-16 17:15:52]: OK [2013-04-16 17:15:52]: Time taken: 0.413 seconds hive (default) set hive.truncate.skiptrash; hive.truncate.skiptrash=false hive (default) set hive.truncate.skiptrash=true; hive (default) TRUNCATE TABLE external1 partition (ds='12') FORCE; [2013-04-16 17:16:21]: Compile Start [2013-04-16 17:16:21]: Compile End [2013-04-16 17:16:21]: OK [2013-04-16 17:16:21]: Time taken: 0.143 seconds hive (default) dfs -ls /user/test/.Trash/Current/; Found 1 items drwxr-xr-x -test supergroup 0 2013-04-16 17:06 /user/test/.Trash/Current/ds=11 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4365) wrong result in left semi join
[ https://issues.apache.org/jira/browse/HIVE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis reassigned HIVE-4365: --- Assignee: Navis wrong result in left semi join -- Key: HIVE-4365 URL: https://issues.apache.org/jira/browse/HIVE-4365 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0 Reporter: ransom.hezhiqiang Assignee: Navis Attachments: HIVE-4365.D10341.1.patch wrong result in left semi join while hive.optimize.ppd=true for example: 1、create table create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string) row format DELIMITED FIELDS TERMINATED BY '|'; create table t2(c1 int) ; 2、load data load data local inpath '/home/test/t1.txt' OVERWRITE into table t1; load data local inpath '/home/test/t2.txt' OVERWRITE into table t2; t1 data: 1|3|10003|52|781.96|555|201203 1|3|10003|39|782.96|555|201203 1|3|10003|87|783.96|555|201203 2|5|10004|24|789.96|555|201203 2|5|10004|58|788.96|555|201203 t2 data: 555 3、excute Query select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 and t1.c1 = '1' and t1.c7 = '201203' ; can got result. select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 where t1.c1 = '1' and t1.c7 = '201203' ; can't got result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4365) wrong result in left semi join
[ https://issues.apache.org/jira/browse/HIVE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4365: -- Attachment: HIVE-4365.D10341.1.patch navis requested code review of HIVE-4365 [jira] wrong result in left semi join. Reviewers: JIRA HIVE-4365 wrong result in left semi join wrong result in left semi join while hive.optimize.ppd=true for example: 1、create table create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string) row format DELIMITED FIELDS TERMINATED BY '|'; create table t2(c1 int) ; 2、load data load data local inpath '/home/test/t1.txt' OVERWRITE into table t1; load data local inpath '/home/test/t2.txt' OVERWRITE into table t2; t1 data: 1|3|10003|52|781.96|555|201203 1|3|10003|39|782.96|555|201203 1|3|10003|87|783.96|555|201203 2|5|10004|24|789.96|555|201203 2|5|10004|58|788.96|555|201203 t2 data: 555 3、excute Query select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 and t1.c1 = '1' and t1.c7 = '201203' ; can got result. select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 where t1.c1 = '1' and t1.c7 = '201203' ; can't got result. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D10341 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java ql/src/test/queries/clientpositive/semijoin.q ql/src/test/results/clientpositive/semijoin.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/24771/ To: JIRA, navis wrong result in left semi join -- Key: HIVE-4365 URL: https://issues.apache.org/jira/browse/HIVE-4365 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0 Reporter: ransom.hezhiqiang Assignee: Navis Attachments: HIVE-4365.D10341.1.patch wrong result in left semi join while hive.optimize.ppd=true for example: 1、create table create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string) row format DELIMITED FIELDS TERMINATED BY '|'; create table t2(c1 int) ; 2、load data load data local inpath '/home/test/t1.txt' OVERWRITE into table t1; load data local inpath '/home/test/t2.txt' OVERWRITE into table t2; t1 data: 1|3|10003|52|781.96|555|201203 1|3|10003|39|782.96|555|201203 1|3|10003|87|783.96|555|201203 2|5|10004|24|789.96|555|201203 2|5|10004|58|788.96|555|201203 t2 data: 555 3、excute Query select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 and t1.c1 = '1' and t1.c7 = '201203' ; can got result. select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 where t1.c1 = '1' and t1.c7 = '201203' ; can't got result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4365) wrong result in left semi join
[ https://issues.apache.org/jira/browse/HIVE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634934#comment-13634934 ] Navis commented on HIVE-4365: - Yes, it was a PPD problem in RS. Right alias of left semi join takes all predicates. wrong result in left semi join -- Key: HIVE-4365 URL: https://issues.apache.org/jira/browse/HIVE-4365 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0 Reporter: ransom.hezhiqiang Assignee: Navis Attachments: HIVE-4365.D10341.1.patch wrong result in left semi join while hive.optimize.ppd=true for example: 1、create table create table t1(c1 int,c2 int, c3 int, c4 int, c5 double,c6 int,c7 string) row format DELIMITED FIELDS TERMINATED BY '|'; create table t2(c1 int) ; 2、load data load data local inpath '/home/test/t1.txt' OVERWRITE into table t1; load data local inpath '/home/test/t2.txt' OVERWRITE into table t2; t1 data: 1|3|10003|52|781.96|555|201203 1|3|10003|39|782.96|555|201203 1|3|10003|87|783.96|555|201203 2|5|10004|24|789.96|555|201203 2|5|10004|58|788.96|555|201203 t2 data: 555 3、excute Query select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 and t1.c1 = '1' and t1.c7 = '201203' ; can got result. select t1.c1,t1.c2,t1.c3,t1.c4,t1.c5,t1.c6,t1.c7 from t1 left semi join t2 on t1.c6 = t2.c1 where t1.c1 = '1' and t1.c7 = '201203' ; can't got result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4376) Document ORC file format in Hive wiki
Lefty Leverenz created HIVE-4376: Summary: Document ORC file format in Hive wiki Key: HIVE-4376 URL: https://issues.apache.org/jira/browse/HIVE-4376 Project: Hive Issue Type: Bug Components: Documentation, Serializers/Deserializers Affects Versions: 0.11.0 Reporter: Lefty Leverenz Assignee: Lefty Leverenz Add a wiki documenting the Optimized Row Columnar file format for Hive release 0.11 ([HIVE-3874|https://issues.apache.org/jira/browse/HIVE-3874]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: enhance TRUNCATE syntex to drop data of external table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10600/#review19374 --- http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/10600/#comment40085 ST class is needed to compile. http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/10600/#comment40084 A build failure on this line. - Teddy Choi On April 18, 2013, 6:25 a.m., fangkun cao wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10600/ --- (Updated April 18, 2013, 6:25 a.m.) Review request for hive. Description --- https://issues.apache.org/jira/browse/HIVE-4367 This addresses bug HIVE-4367. https://issues.apache.org/jira/browse/HIVE-4367 Diffs - http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1468713 http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml.template 1468713 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1468713 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1468713 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 1468713 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 1468713 Diff: https://reviews.apache.org/r/10600/diff/ Testing --- create external table external1 (a int, b int) partitioned by (ds string); load data local inpath '../data/files/kv1.txt' into table external1 partition (ds='2008-04-08'); load data local inpath '../data/files/kv1.txt' into table external1 partition (ds='2008-04-09'); -- trucate EXTERNAL table TRUNCATE TABLE external1 PARTITION (ds='2008-04-08') FORCE; select * from external1 where ds='2008-04-08'; select * from external1 where ds='2008-04-09'; TRUNCATE TABLE external1 FORCE; select * from external1; Thanks, fangkun cao
Re: Review Request: enhance TRUNCATE syntex to drop data of external table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10600/ --- (Updated April 18, 2013, 8:19 a.m.) Review request for hive. Changes --- add truncate_table_force.q and import org.stringtemplate.v4.ST; Description --- https://issues.apache.org/jira/browse/HIVE-4367 This addresses bug HIVE-4367. https://issues.apache.org/jira/browse/HIVE-4367 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1469218 http://svn.apache.org/repos/asf/hive/trunk/conf/hive-default.xml.template 1469218 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1469218 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1469218 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 1469218 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 1469218 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/truncate_table_force.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/truncate_table_force.q.out PRE-CREATION Diff: https://reviews.apache.org/r/10600/diff/ Testing --- create external table external1 (a int, b int) partitioned by (ds string); load data local inpath '../data/files/kv1.txt' into table external1 partition (ds='2008-04-08'); load data local inpath '../data/files/kv1.txt' into table external1 partition (ds='2008-04-09'); -- trucate EXTERNAL table TRUNCATE TABLE external1 PARTITION (ds='2008-04-08') FORCE; select * from external1 where ds='2008-04-08'; select * from external1 where ds='2008-04-09'; TRUNCATE TABLE external1 FORCE; select * from external1; Thanks, fangkun cao
[jira] [Commented] (HIVE-4371) some issue with merging join trees
[ https://issues.apache.org/jira/browse/HIVE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634999#comment-13634999 ] Namit Jain commented on HIVE-4371: -- I am not sure about the last test case. Why is left Alias (es) and right Alias (es) not correct for that ? some issue with merging join trees -- Key: HIVE-4371 URL: https://issues.apache.org/jira/browse/HIVE-4371 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Namit Jain Assignee: Navis Attachments: HIVE-4371.D10323.1.patch [~navis], I would really appreciate if you can take a look. I am attaching a testcase, for which in the optimizer the join context left aliases and right aliases do not look correct. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: New code for VectorizedRowBatch to form basis of vectorized query execution
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10592/#review19379 --- ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java https://reviews.apache.org/r/10592/#comment40093 These comments violate the coding conventions. ColumnVector and VectorizedRowBatch have the same problem. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java https://reviews.apache.org/r/10592/#comment40092 Move the constant 1.2 to a static final float and refer to it by name. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java https://reviews.apache.org/r/10592/#comment40091 Formatting ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java https://reviews.apache.org/r/10592/#comment40090 Please throw a runtime exception here instead of relying on asserts (which can be disabled). ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatch.java https://reviews.apache.org/r/10592/#comment40089 Please correct the formatting issues in this file. - Carl Steinbach On April 18, 2013, 1:27 a.m., Eric Hanson wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10592/ --- (Updated April 18, 2013, 1:27 a.m.) Review request for hive. Description --- New code for VectorizedRowBatch to form basis of vectorized query execution This addresses bug HIVE-4284. https://issues.apache.org/jira/browse/HIVE-4284 Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DoubleColumnVector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/LongColumnVector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatch.java PRE-CREATION Diff: https://reviews.apache.org/r/10592/diff/ Testing --- Thanks, Eric Hanson
[jira] [Updated] (HIVE-3891) physical optimizer changes for auto sort-merge join
[ https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3891: - Attachment: hive.3891.14.patch physical optimizer changes for auto sort-merge join --- Key: HIVE-3891 URL: https://issues.apache.org/jira/browse/HIVE-3891 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: auto_sortmerge_join_1.q, auto_sortmerge_join_1.q.out, hive.3891.10.patch, hive.3891.11.patch, hive.3891.12.patch, hive.3891.13.patch, hive.3891.14.patch, hive.3891.1.patch, hive.3891.2.patch, hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, hive.3891.7.patch, HIVE-3891_8.patch, hive.3891.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3891) physical optimizer changes for auto sort-merge join
[ https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635005#comment-13635005 ] Namit Jain commented on HIVE-3891: -- [~ashutoshc], all the tests passed. Since this was accepted sometime back, can you take a look again ? physical optimizer changes for auto sort-merge join --- Key: HIVE-3891 URL: https://issues.apache.org/jira/browse/HIVE-3891 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: auto_sortmerge_join_1.q, auto_sortmerge_join_1.q.out, hive.3891.10.patch, hive.3891.11.patch, hive.3891.12.patch, hive.3891.13.patch, hive.3891.14.patch, hive.3891.1.patch, hive.3891.2.patch, hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, hive.3891.7.patch, HIVE-3891_8.patch, hive.3891.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635108#comment-13635108 ] Namit Jain commented on HIVE-4095: -- more comments Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
hi
Hi, Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Historically, we have not focussed on a few things, and they might soon bite us. I wanted to propose the following for all checkins: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Convention for variable/function names – do we have any ? 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 5. Comments in each test (.q file)– that should include the jira number, what is it trying to test. Assumptions about each query. 6. Reduce the output for each test – whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. In general, focussing on a lot of comments in the code will go a long way for everyone to follow along. Thanks, -namit
[jira] [Commented] (HIVE-4304) Remove unused builtins and pdk submodules
[ https://issues.apache.org/jira/browse/HIVE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635233#comment-13635233 ] Ashutosh Chauhan commented on HIVE-4304: [~traviscrawford] Rebasing your branch after commit of HIVE-4278 and with few minor edits, I was able to build successfully. Remove unused builtins and pdk submodules - Key: HIVE-4304 URL: https://issues.apache.org/jira/browse/HIVE-4304 Project: Hive Issue Type: Improvement Reporter: Travis Crawford Assignee: Travis Crawford Attachments: HIVE-4304.1.patch Moving from email. The [builtins|http://svn.apache.org/repos/asf/hive/trunk/builtins/] and [pdk|http://svn.apache.org/repos/asf/hive/trunk/pdk/] submodules are not believed to be in use and should be removed. The main benefits are simplification and maintainability of the Hive code base. Forwarded conversation Subject: builtins submodule - is it still needed? From: Travis Crawford traviscrawf...@gmail.com Date: Thu, Apr 4, 2013 at 2:01 PM To: u...@hive.apache.org, dev@hive.apache.org Hey hive gurus - Is the builtins hive submodule in use? The submodule was added in HIVE-2523 as a location for builtin-UDFs, but it appears to not have taken off. Any objections to removing it? DETAILS For HIVE-4278 I'm making some build changes for the HCatalog integration. The builtins submodule causes issues because it delays building until the packaging phase - so HCatalog can't depend on builtins, which it does transitively. While investigating a path forward I discovered the builtins submodule contains very little code, and likely could either go away entirely or merge into ql, simplifying things both for users and developers. Thoughts? Can anyone with context help me understand builtins, both in general and around its non-standard build? For your trouble I'll either make the submodule go away/merge into another submodule, or update the docs with what we learn. Thanks! Travis -- From: Ashutosh Chauhan ashutosh.chau...@gmail.com Date: Fri, Apr 5, 2013 at 3:10 PM To: dev@hive.apache.org Cc: u...@hive.apache.org u...@hive.apache.org I haven't used it myself anytime till now. Neither have met anyone who used it or plan to use it. Ashutosh On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford traviscrawf...@gmail.comwrote: -- From: Gunther Hagleitner ghagleit...@hortonworks.com Date: Fri, Apr 5, 2013 at 3:11 PM To: dev@hive.apache.org Cc: u...@hive.apache.org +1 I would actually go a step further and propose to remove both PDK and builtins. I've went through the code for both and here is what I found: Builtins: - BuiltInUtils.java: Empty file - UDAFUnionMap: Merges maps. Doesn't seem to be useful by itself, but was intended as a building block for PDK PDK: - some helper build.xml/test setup + teardown scripts - Classes/annotations to help run unit tests - rot13 as an example From what I can tell it's a fair assessment that it hasn't taken off, last commits to it seem to have happened more than 1.5 years ago. Thanks, Gunther. On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford traviscrawf...@gmail.comwrote: -- From: Owen O'Malley omal...@apache.org Date: Fri, Apr 5, 2013 at 4:45 PM To: u...@hive.apache.org +1 to removing them. We have a Rot13 example in ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13{In,Out}putFormat.java anyways. *smile* -- Owen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4225) HiveServer2 does not support SASL QOP
[ https://issues.apache.org/jira/browse/HIVE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635236#comment-13635236 ] Joey Echeverria commented on HIVE-4225: --- Does it make sense to push this as is and then have a follow-up issue tied to [HIVE-4232]? HiveServer2 does not support SASL QOP - Key: HIVE-4225 URL: https://issues.apache.org/jira/browse/HIVE-4225 Project: Hive Issue Type: Bug Components: HiveServer2, Shims Affects Versions: 0.11.0 Reporter: Chris Drome Assignee: Chris Drome Fix For: 0.11.0 Attachments: HIVE-4225.patch HiveServer2 implements Kerberos authentication through SASL framework, but does not support setting QOP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: hi
Super like it. On 4/18/13 5:31 AM, Namit Jain nj...@fb.com wrote: Hi, Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Historically, we have not focussed on a few things, and they might soon bite us. I wanted to propose the following for all checkins: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Convention for variable/function names do we have any ? 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 5. Comments in each test (.q file) that should include the jira number, what is it trying to test. Assumptions about each query. 6. Reduce the output for each test whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. In general, focussing on a lot of comments in the code will go a long way for everyone to follow along. Thanks, -namit
[jira] [Created] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
Gang Tim Liu created HIVE-4377: -- Summary: Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340) Key: HIVE-4377 URL: https://issues.apache.org/jira/browse/HIVE-4377 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gang Tim Liu Assignee: Navis thanks a lot for addressing optimization in HIVE-2340. Awesome! Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Highlights which are applicable for D1209: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Comments in each test (.q file) that should include the jira number, what is it trying to test. Assumptions about each query. 5. Reduce the output for each test whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: hi
Hi Namit, I like your proposal very much and I would take it a bit further: 1. ... For any complex function, clear examples (input/output) would really help. I'm concerned that examples in the code (comments) might very quickly become obsolete as it can very easily happen that someone will change the code without changing the example. What about using for this purpose normal unit tests? Developers will still be able to see the expected input/output, but in addition we will have automatic way how to detect (possibly incompatible) changes. Please note that I'm not suggesting to abandon the *.q file tests, just to also include unit tests for complex methods. Jarcec On Thu, Apr 18, 2013 at 12:31:10PM +, Namit Jain wrote: Hi, Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Historically, we have not focussed on a few things, and they might soon bite us. I wanted to propose the following for all checkins: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Convention for variable/function names – do we have any ? 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 5. Comments in each test (.q file)– that should include the jira number, what is it trying to test. Assumptions about each query. 6. Reduce the output for each test – whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. In general, focussing on a lot of comments in the code will go a long way for everyone to follow along. Thanks, -namit signature.asc Description: Digital signature
[jira] [Updated] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4095: -- Attachment: HIVE-4095.D10347.1.patch sindheeraj requested code review of HIVE-4095 [jira] Add exchange partition in Hive. Reviewers: JIRA JIRA changes TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D10347 AFFECTED FILES .gitignore metastore/if/hive_metastore.thrift metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableExchangePartition.java ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java ql/src/test/queries/clientnegative/exchange_partition_neg_incomplete_partition.q ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists.q ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists2.q ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists3.q ql/src/test/queries/clientnegative/exchange_partition_neg_partition_missing.q ql/src/test/queries/clientnegative/exchange_partition_neg_table_missing.q ql/src/test/queries/clientnegative/exchange_partition_neg_table_missing2.q ql/src/test/queries/clientnegative/exchange_partition_neg_test.q ql/src/test/queries/clientpositive/exchange_partition.q ql/src/test/queries/clientpositive/exchange_partition2.q ql/src/test/queries/clientpositive/exchange_partition3.q ql/src/test/results/clientnegative/exchange_partition_neg_incomplete_partition.q.out ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists.q.out ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists2.q.out ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists3.q.out ql/src/test/results/clientnegative/exchange_partition_neg_partition_missing.q.out ql/src/test/results/clientnegative/exchange_partition_neg_table_missing.q.out ql/src/test/results/clientnegative/exchange_partition_neg_table_missing2.q.out ql/src/test/results/clientnegative/exchange_partition_neg_test.q.out ql/src/test/results/clientpositive/exchange_partition.q.out ql/src/test/results/clientpositive/exchange_partition2.q.out ql/src/test/results/clientpositive/exchange_partition3.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/24783/ To: JIRA, sindheeraj Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dheeraj Kumar Singh updated HIVE-4095: -- Attachment: (was: HIVE-4095.part11.patch.txt) Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dheeraj Kumar Singh updated HIVE-4095: -- Attachment: (was: HIVE-4095.part12.patch.txt) Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635282#comment-13635282 ] Phabricator commented on HIVE-4095: --- sindheeraj has abandoned the revision HIVE-4095 [jira] Add exchange partition in Hive. REVISION DETAIL https://reviews.facebook.net/D10347 To: JIRA, sindheeraj Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: hi
Agreed. On 4/18/13 9:19 PM, Jarek Jarcec Cecho jar...@apache.org wrote: Hi Namit, I like your proposal very much and I would take it a bit further: 1. ... For any complex function, clear examples (input/output) would really help. I'm concerned that examples in the code (comments) might very quickly become obsolete as it can very easily happen that someone will change the code without changing the example. What about using for this purpose normal unit tests? Developers will still be able to see the expected input/output, but in addition we will have automatic way how to detect (possibly incompatible) changes. Please note that I'm not suggesting to abandon the *.q file tests, just to also include unit tests for complex methods. Jarcec On Thu, Apr 18, 2013 at 12:31:10PM +, Namit Jain wrote: Hi, Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Historically, we have not focussed on a few things, and they might soon bite us. I wanted to propose the following for all checkins: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Convention for variable/function names do we have any ? 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 5. Comments in each test (.q file) that should include the jira number, what is it trying to test. Assumptions about each query. 6. Reduce the output for each test whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. In general, focussing on a lot of comments in the code will go a long way for everyone to follow along. Thanks, -namit
Re: hi
Hi, I like the proposal as well! On Thu, Apr 18, 2013 at 10:49 AM, Jarek Jarcec Cecho jar...@apache.orgwrote: Hi Namit, I like your proposal very much and I would take it a bit further: 1. ... For any complex function, clear examples (input/output) would really help. I'm concerned that examples in the code (comments) might very quickly become obsolete as it can very easily happen that someone will change the code without changing the example. What about using for this purpose normal unit tests? Developers will still be able to see the expected input/output, but in addition we will have automatic way how to detect (possibly incompatible) changes. Please note that I'm not suggesting to abandon the *.q file tests, just to also include unit tests for complex methods. I'd be interested in including more unit tests as well. I like the existing q file test framework but when working on code I find unit tests which can complete in less than a second or allows for faster iterations than waiting 30 or so seconds for a q-file test to complete. Brock
[jira] [Updated] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dheeraj Kumar Singh updated HIVE-4095: -- Attachment: HIVE-4095.part11.patch.txt HIVE-4095.part12.patch.txt Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635285#comment-13635285 ] Dheeraj Kumar Singh commented on HIVE-4095: --- The revision is still https://reviews.facebook.net/D10035, I have abandoned the extra revision phabricator had created. Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: hi
Having said that, it might be difficult to write unit tests for operator trees. Might take more time initially - so, making it a constraint might slow us down. On 4/18/13 9:41 PM, Brock Noland br...@cloudera.com wrote: Hi, I like the proposal as well! On Thu, Apr 18, 2013 at 10:49 AM, Jarek Jarcec Cecho jar...@apache.orgwrote: Hi Namit, I like your proposal very much and I would take it a bit further: 1. ... For any complex function, clear examples (input/output) would really help. I'm concerned that examples in the code (comments) might very quickly become obsolete as it can very easily happen that someone will change the code without changing the example. What about using for this purpose normal unit tests? Developers will still be able to see the expected input/output, but in addition we will have automatic way how to detect (possibly incompatible) changes. Please note that I'm not suggesting to abandon the *.q file tests, just to also include unit tests for complex methods. I'd be interested in including more unit tests as well. I like the existing q file test framework but when working on code I find unit tests which can complete in less than a second or allows for faster iterations than waiting 30 or so seconds for a q-file test to complete. Brock
Re: hi
Agreed, given that most of our tests our existing tests are in .q files, I'd prefer to see more of a unit tests highly encouraged policy as opposed to must have unit tests. On Thu, Apr 18, 2013 at 11:17 AM, Namit Jain nj...@fb.com wrote: Having said that, it might be difficult to write unit tests for operator trees. Might take more time initially - so, making it a constraint might slow us down. On 4/18/13 9:41 PM, Brock Noland br...@cloudera.com wrote: Hi, I like the proposal as well! On Thu, Apr 18, 2013 at 10:49 AM, Jarek Jarcec Cecho jar...@apache.orgwrote: Hi Namit, I like your proposal very much and I would take it a bit further: 1. ... For any complex function, clear examples (input/output) would really help. I'm concerned that examples in the code (comments) might very quickly become obsolete as it can very easily happen that someone will change the code without changing the example. What about using for this purpose normal unit tests? Developers will still be able to see the expected input/output, but in addition we will have automatic way how to detect (possibly incompatible) changes. Please note that I'm not suggesting to abandon the *.q file tests, just to also include unit tests for complex methods. I'd be interested in including more unit tests as well. I like the existing q file test framework but when working on code I find unit tests which can complete in less than a second or allows for faster iterations than waiting 30 or so seconds for a q-file test to complete. Brock -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635314#comment-13635314 ] Phabricator commented on HIVE-2340: --- njain has commented on the revision HIVE-2340 [jira] optimize orderby followed by a groupby. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:181 nit: spelling Abstract ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:103 The order in which the rules are specified matter, since in case of exact match for costs, the last rule is invoked. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:122 What are the semantics of trustScript ? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:359 can you add more comments ? REVISION DETAIL https://reviews.facebook.net/D1209 BRANCH DPAL-592 ARCANIST PROJECT hive To: JIRA, hagleitn, navis Cc: hagleitn, njain optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Fix For: 0.11.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2019) Implement NOW() UDF
[ https://issues.apache.org/jira/browse/HIVE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635327#comment-13635327 ] Eric Hanson commented on HIVE-2019: --- Agreed, especially with the phrase right before executing the query. The timestamp should be gotten once at query execution startup time, not compile time. Although these two steps are pretty much the same in Hive now, someday there could be a plan cache, so a cached NOW() result would get stale. Or, if a compilation takes a long time for some reason, NOW() could get stale. This is how it is done in one commercial DBMS that I know. If there are multiple different flavors of date and time functions, they should all be based off the same internal hi-resolution timestamp. That way they would all be consistent within one query execution if multiple functions are used, say DATE(), NOW() etc. in the same query. Implement NOW() UDF --- Key: HIVE-2019 URL: https://issues.apache.org/jira/browse/HIVE-2019 Project: Hive Issue Type: New Feature Components: UDF Reporter: Carl Steinbach Assignee: Priyadarshini Attachments: HIVE-2019.patch Reference: http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-4333: Attachment: HIVE-4333.1.patch.txt most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Matthew Weaver Attachments: HIVE-4333.1.patch.txt Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635362#comment-13635362 ] Harish Butani commented on HIVE-4333: - Attached a patch. The changes fall into these categories: - some queries had 'partition by p_mfgr order by p_mfgr' or just 'partition by p_mfgr'. In these cases rows within a partition are not coming in the same order as in hadoop 1. Changed to 'partition by p_mfgr order by p_name' - Manufacturer 1 has 2 rows with exactly the same data; so if we use a 'row based window' there are diffs between 1 2. Changed to using a 'range based window' - There are diffs because of precision. Some of the avg and sum functions are now wrapped in 'round' - Finally tests with the empty over() on fns that relied on order had to changed. For e.g. leadlag.q Query 8. I tried the following change: {noformat} select p_name, p_retailprice, lead(p_retailprice) over() as l1 , lag(p_retailprice) over() as l2 from (select p_name, p_retailprice from part where p_mfgr = 'Manufacturer#1' order by p_name, p_retailprice ) p; {noformat} The output in hadoop 1 is: {noformat} almond antique burnished rose metallic 1173.15 1173.15 NULL almond antique burnished rose metallic 1173.15 1753.76 1173.15 almond antique chartreuse lavender yellow 1753.76 1602.59 1173.15 almond antique salmon chartreuse burlywood 1602.59 1414.42 1753.76 almond aquamarine burnished black steel 1414.42 1632.66 1602.59 almond aquamarine pink moccasin thistle 1632.66 NULL1414.42 {noformat} The input to lead and lag query is ordered on p_name and p_retailprice and is very small, just 6 rows(so only 1 mapper is involved) In 1.0 the rows are coming to the reducer in the same order as the input In hadoop 2.0 the result is: {noformat} almond aquamarine pink moccasin thistle 1632.66 1414.42 NULL almond aquamarine burnished black steel 1414.42 1602.59 1632.66 almond antique salmon chartreuse burlywood 1602.59 1753.76 1414.42 almond antique chartreuse lavender yellow 1753.76 1173.15 1602.59 almond antique burnished rose metallic 1173.15 1173.15 1753.76 almond antique burnished rose metallic 1173.15 NULL1173.15 {noformat} Looks like the shuffle in 2.0 reorders the rows even in this case. most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Matthew Weaver Attachments: HIVE-4333.1.patch.txt Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635424#comment-13635424 ] Namit Jain commented on HIVE-4095: -- +1 Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4019) Ability to create and drop temporary partition function
[ https://issues.apache.org/jira/browse/HIVE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635470#comment-13635470 ] Brock Noland commented on HIVE-4019: https://reviews.facebook.net/D10353 Ability to create and drop temporary partition function --- Key: HIVE-4019 URL: https://issues.apache.org/jira/browse/HIVE-4019 Project: Hive Issue Type: New Feature Components: PTF-Windowing Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-4019-1.patch, HIVE-4019.2.patch, HIVE-4019-3.patch, HIVE-4019-4.patch, hive-4019.q Just like udf/udaf/udtf functions, user should be able to add and drop custom partitioning functions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Install
Hi, how to install Hive on my personal desktop. Am very new to Hive and my background is Legacy Mainframe system. can you please suggest me in detail manner. Thanks, Suman
[jira] [Created] (HIVE-4378) Counters hit performance even when not used
Gunther Hagleitner created HIVE-4378: Summary: Counters hit performance even when not used Key: HIVE-4378 URL: https://issues.apache.org/jira/browse/HIVE-4378 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.11.0 preprocess/postprocess counters perform a number of computations even when there are no counters to update. Performance runs are captured in: https://issues.apache.org/jira/browse/HIVE-4318 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4378) Counters hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4378: - Attachment: HIVE-4378.1.patch Counters hit performance even when not used --- Key: HIVE-4378 URL: https://issues.apache.org/jira/browse/HIVE-4378 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.11.0 Attachments: HIVE-4378.1.patch preprocess/postprocess counters perform a number of computations even when there are no counters to update. Performance runs are captured in: https://issues.apache.org/jira/browse/HIVE-4318 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4378) Counters hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635508#comment-13635508 ] Gunther Hagleitner commented on HIVE-4378: -- https://reviews.facebook.net/D10359 Counters hit performance even when not used --- Key: HIVE-4378 URL: https://issues.apache.org/jira/browse/HIVE-4378 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.11.0 Attachments: HIVE-4378.1.patch preprocess/postprocess counters perform a number of computations even when there are no counters to update. Performance runs are captured in: https://issues.apache.org/jira/browse/HIVE-4318 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4282) Implement vectorized column-scalar expressions
[ https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4282: --- Summary: Implement vectorized column-scalar expressions (was: Implement vectorized arithmetic expressions.) Implement vectorized column-scalar expressions -- Key: HIVE-4282 URL: https://issues.apache.org/jira/browse/HIVE-4282 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Implement arithmetic expressions that operate on vectors of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4379) Implement Vectorized Column-Column expressions
Jitendra Nath Pandey created HIVE-4379: -- Summary: Implement Vectorized Column-Column expressions Key: HIVE-4379 URL: https://issues.apache.org/jira/browse/HIVE-4379 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey This covers the expressions involving two columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4378) Counters hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4378: - Status: Patch Available (was: Open) Counters hit performance even when not used --- Key: HIVE-4378 URL: https://issues.apache.org/jira/browse/HIVE-4378 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.11.0 Attachments: HIVE-4378.1.patch preprocess/postprocess counters perform a number of computations even when there are no counters to update. Performance runs are captured in: https://issues.apache.org/jira/browse/HIVE-4318 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4380) Implement Vectorized Scalar-Column expressions
Jitendra Nath Pandey created HIVE-4380: -- Summary: Implement Vectorized Scalar-Column expressions Key: HIVE-4380 URL: https://issues.apache.org/jira/browse/HIVE-4380 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson The expressions with scalar as the first operand. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4318: - Attachment: HIVE-4318.3.patch OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.3.patch, HIVE-4318.patch.pam.txt Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4381) Implement vectorized aggregation expressions
Jitendra Nath Pandey created HIVE-4381: -- Summary: Implement vectorized aggregation expressions Key: HIVE-4381 URL: https://issues.apache.org/jira/browse/HIVE-4381 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Remus Rusanu Vectorized implementation for sum, min, max, average and count. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4282) Implement vectorized column-scalar expressions
[ https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4282: --- Description: Implement arithmetic expressions involving a column and a scalar with column as first argument. (was: Implement arithmetic expressions that operate on vectors of columns.) Implement vectorized column-scalar expressions -- Key: HIVE-4282 URL: https://issues.apache.org/jira/browse/HIVE-4282 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Implement arithmetic expressions involving a column and a scalar with column as first argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635527#comment-13635527 ] Gunther Hagleitner commented on HIVE-4318: -- Thanks. I've rebased the patch and split it into two. HIVE-4378 has the changes for the counters, this one is about operator hooks/profiler. This way I am hoping it's easier to start the work on re-introducing the profiler, because only relevant changes are captured in this patch. OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.3.patch, HIVE-4318.patch.pam.txt Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4318: - Status: Patch Available (was: Open) OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.3.patch, HIVE-4318.patch.pam.txt Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4282) Implement vectorized column-scalar expressions
[ https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4282: --- Attachment: HIVE-4282.1.patch Implement vectorized column-scalar expressions -- Key: HIVE-4282 URL: https://issues.apache.org/jira/browse/HIVE-4282 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4282.1.patch Implement arithmetic expressions involving a column and a scalar with column as first argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4282) Implement vectorized column-scalar expressions
[ https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635539#comment-13635539 ] Jitendra Nath Pandey commented on HIVE-4282: A patch is uploaded. All the files in the ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/ directory are files generated from a template. The template files are in ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/. The code to generated the files is in CodeGen.java. We plan to add an ant task to generate the files from the templates, which we will do in a follow up jira. Implement vectorized column-scalar expressions -- Key: HIVE-4282 URL: https://issues.apache.org/jira/browse/HIVE-4282 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4282.1.patch Implement arithmetic expressions involving a column and a scalar with column as first argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Install
On Thu, Apr 18, 2013 at 11:59 PM, Suman Prabhala sumanprabh...@gmail.comwrote: lease suggest me in detail manner. Read Work out learn :) *Thanks Regards* ∞ Shashwat Shriparv
Re: Review Request: New code for VectorizedRowBatch to form basis of vectorized query execution
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10592/ --- (Updated April 18, 2013, 7:41 p.m.) Review request for hive. Changes --- Updated based on additional code review comments. Description --- New code for VectorizedRowBatch to form basis of vectorized query execution This addresses bug HIVE-4284. https://issues.apache.org/jira/browse/HIVE-4284 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DoubleColumnVector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/LongColumnVector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatch.java PRE-CREATION Diff: https://reviews.apache.org/r/10592/diff/ Testing --- Thanks, Eric Hanson
[jira] [Updated] (HIVE-4284) Implement class for vectorized row batch
[ https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4284: -- Attachment: HIVE-4284.5.patch modified patch with updates based on code review comments Implement class for vectorized row batch Key: HIVE-4284 URL: https://issues.apache.org/jira/browse/HIVE-4284 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch Vectorized row batch object will represent the row batch that vectorized operators will work on. Refer to design spec attached to HIVE-4160 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4284) Implement class for vectorized row batch
[ https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635591#comment-13635591 ] Eric Hanson commented on HIVE-4284: --- New diff available for review at https://reviews.apache.org/r/10592/ Implement class for vectorized row batch Key: HIVE-4284 URL: https://issues.apache.org/jira/browse/HIVE-4284 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch Vectorized row batch object will represent the row batch that vectorized operators will work on. Refer to design spec attached to HIVE-4160 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4284) Implement class for vectorized row batch
[ https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635597#comment-13635597 ] Carl Steinbach commented on HIVE-4284: -- +1 Implement class for vectorized row batch Key: HIVE-4284 URL: https://issues.apache.org/jira/browse/HIVE-4284 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch Vectorized row batch object will represent the row batch that vectorized operators will work on. Refer to design spec attached to HIVE-4160 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4350) support AS keyword for table alias
[ https://issues.apache.org/jira/browse/HIVE-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4350: Assignee: Matthew Weaver support AS keyword for table alias -- Key: HIVE-4350 URL: https://issues.apache.org/jira/browse/HIVE-4350 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Thejas M Nair Assignee: Matthew Weaver SQL standard supports AS optional keyword, while creating an table alias. http://savage.net.au/SQL/sql-92.bnf.html#table reference Hive gives a error when the optional keyword is used - select * from tiny as t1; org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: ParseException line 1:19 mismatched input 'as' expecting EOF near 'tiny' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3891) physical optimizer changes for auto sort-merge join
[ https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635660#comment-13635660 ] Ashutosh Chauhan commented on HIVE-3891: Left some comments on Phabricator. physical optimizer changes for auto sort-merge join --- Key: HIVE-3891 URL: https://issues.apache.org/jira/browse/HIVE-3891 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: auto_sortmerge_join_1.q, auto_sortmerge_join_1.q.out, hive.3891.10.patch, hive.3891.11.patch, hive.3891.12.patch, hive.3891.13.patch, hive.3891.14.patch, hive.3891.1.patch, hive.3891.2.patch, hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, hive.3891.7.patch, HIVE-3891_8.patch, hive.3891.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4282) Implement vectorized column-scalar expressions
[ https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635669#comment-13635669 ] Jitendra Nath Pandey commented on HIVE-4282: The patch is up on review board. https://reviews.apache.org/r/10608/ Implement vectorized column-scalar expressions -- Key: HIVE-4282 URL: https://issues.apache.org/jira/browse/HIVE-4282 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4282.1.patch Implement arithmetic expressions involving a column and a scalar with column as first argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3620) Drop table using hive CLI throws error when the total number of partition in the table is around 50K.
[ https://issues.apache.org/jira/browse/HIVE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635676#comment-13635676 ] Thiruvel Thirumoolan commented on HIVE-3620: [~sho.shimauchi] Did you have any special parameters for datanucleus to get this working? I tried disabling datanucleus cache and also set connection pools, but that does not seem to help. Will also post a snapshot of memory dump I have. BTW, I tried dropping a table with 45k partitions with the batch size configured to 100 and 1000. Drop table using hive CLI throws error when the total number of partition in the table is around 50K. - Key: HIVE-3620 URL: https://issues.apache.org/jira/browse/HIVE-3620 Project: Hive Issue Type: Bug Reporter: Arup Malakar hive drop table load_test_table_2_0; FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timedout FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask The DB used is Oracle and hive had only one table: select COUNT(*) from PARTITIONS; 54839 I can try and play around with the parameter hive.metastore.client.socket.timeout if that is what is being used. But it is 200 seconds as of now, and 200 seconds for a drop table calls seems high already. Thanks, Arup -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3620) Drop table using hive CLI throws error when the total number of partition in the table is around 50K.
[ https://issues.apache.org/jira/browse/HIVE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HIVE-3620: --- Attachment: Hive-3620_HeapDump.jpg Drop table using hive CLI throws error when the total number of partition in the table is around 50K. - Key: HIVE-3620 URL: https://issues.apache.org/jira/browse/HIVE-3620 Project: Hive Issue Type: Bug Reporter: Arup Malakar Attachments: Hive-3620_HeapDump.jpg hive drop table load_test_table_2_0; FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timedout FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask The DB used is Oracle and hive had only one table: select COUNT(*) from PARTITIONS; 54839 I can try and play around with the parameter hive.metastore.client.socket.timeout if that is what is being used. But it is 200 seconds as of now, and 200 seconds for a drop table calls seems high already. Thanks, Arup -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4278) HCat needs to get current Hive jars instead of pulling them from maven repo
[ https://issues.apache.org/jira/browse/HIVE-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635700#comment-13635700 ] Hudson commented on HIVE-4278: -- Integrated in Hive-trunk-hadoop2 #165 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/165/]) HIVE-4278 : HCat needs to get current Hive jars instead of pulling them from maven repo (Sushanth Sowmyan via Ashutosh Chauhan) (Revision 1469348) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1469348 Files : * /hive/trunk/beeline/ivy.xml * /hive/trunk/build-common.xml * /hive/trunk/build.properties * /hive/trunk/cli/ivy.xml * /hive/trunk/hcatalog/build-support/ant/deploy.xml * /hive/trunk/hcatalog/build.properties * /hive/trunk/hcatalog/core/pom.xml * /hive/trunk/hcatalog/hcatalog-pig-adapter/pom.xml * /hive/trunk/hcatalog/pom.xml * /hive/trunk/hcatalog/server-extensions/pom.xml * /hive/trunk/hcatalog/storage-handlers/hbase/pom.xml * /hive/trunk/hcatalog/webhcat/java-client/pom.xml * /hive/trunk/hcatalog/webhcat/svr/pom.xml * /hive/trunk/hwi/ivy.xml * /hive/trunk/ql/build.xml * /hive/trunk/ql/ivy.xml HCat needs to get current Hive jars instead of pulling them from maven repo --- Key: HIVE-4278 URL: https://issues.apache.org/jira/browse/HIVE-4278 Project: Hive Issue Type: Sub-task Components: Build Infrastructure, HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Sushanth Sowmyan Priority: Blocker Fix For: 0.11.0 Attachments: HIVE-4278.approach2.patch, HIVE-4278.approach2.patch.2.for.branch.11, HIVE-4278.approach2.patch.2.for.branch.12, HIVE-4278.approach2.patch.3.for.branch.12, HIVE-4278.D10257.1.patch, HIVE-4278.D9981.1.patch The HCatalog build is currently pulling Hive jars from the maven repo instead of using the ones built as part of the current build. Now that it is part of Hive it should use the jars being built instead of pulling them from maven. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3620) Drop table using hive CLI throws error when the total number of partition in the table is around 50K.
[ https://issues.apache.org/jira/browse/HIVE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-3620: --- Attachment: HIVE-3620 Heapdump detail.png Just to add even more detail, this leak-report indicates Datanucleus's ConnectionFactoryImpl seems to retain a majority of the memory being leaked (440 MB, in this case). Drop table using hive CLI throws error when the total number of partition in the table is around 50K. - Key: HIVE-3620 URL: https://issues.apache.org/jira/browse/HIVE-3620 Project: Hive Issue Type: Bug Reporter: Arup Malakar Attachments: HIVE-3620 Heapdump detail.png, Hive-3620_HeapDump.jpg hive drop table load_test_table_2_0; FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timedout FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask The DB used is Oracle and hive had only one table: select COUNT(*) from PARTITIONS; 54839 I can try and play around with the parameter hive.metastore.client.socket.timeout if that is what is being used. But it is 200 seconds as of now, and 200 seconds for a drop table calls seems high already. Thanks, Arup -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4200) Consolidate submodule dependencies using ivy inheritance
[ https://issues.apache.org/jira/browse/HIVE-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-4200: - Attachment: HIVE-4200.2.patch Updated version of the patch that fixes offline mode. I verified that 'ant clean package -Doffline=true' works with the network cable pulled out. The downside is that I had to disable the HCatalog build since they're still doing there own thing. Consolidate submodule dependencies using ivy inheritance Key: HIVE-4200 URL: https://issues.apache.org/jira/browse/HIVE-4200 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4200.1.patch.txt, HIVE-4200.2.patch As discussed in 4187: For easier maintenance of ivy dependencies across submodules: Create parent ivy file with consolidated dependencies and include into submodules via inheritance. This way we're not relying on transitive dependencies, but also have the dependencies in a single place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution
[ https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635734#comment-13635734 ] Carl Steinbach commented on HIVE-4305: -- bq. Does ivy have a completely offline mode? That is what I am most interested in and haven't been able to find it. For example, ivy.cache.ttl.default=eternal doesn't stop the downloading. [~brocknoland] I have good news and bad news. The good news is that Ivy supports completely offline builds via the resolver's useCacheOnly property. I updated my patch for HIVE-4200 with these changes and verified that offline builds work with the network cable pulled out. The bad news is that the HCatalog build is still doing its own thing and doesn't respect the offline flag, so to make this work I had to remove hcatalog from the submodule lists in build.properties. I plan to fix this over the weekend by switching hcatalog over to Ivy. Use a single system for dependency resolution - Key: HIVE-4305 URL: https://issues.apache.org/jira/browse/HIVE-4305 Project: Hive Issue Type: Improvement Components: Build Infrastructure, HCatalog Reporter: Travis Crawford Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy for dependency resolution while HCatalog uses maven-ant-tasks. With the project merge we should converge on a single tool for dependency resolution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HIVE-4200) Consolidate submodule dependencies using ivy inheritance
[ https://issues.apache.org/jira/browse/HIVE-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635723#comment-13635723 ] Carl Steinbach edited comment on HIVE-4200 at 4/18/13 9:48 PM: --- Updated version of the patch that fixes offline mode. I verified that 'ant clean package -Doffline=true' works with the network cable pulled out. The downside is that I had to disable the HCatalog build since they're still doing their own thing. was (Author: cwsteinbach): Updated version of the patch that fixes offline mode. I verified that 'ant clean package -Doffline=true' works with the network cable pulled out. The downside is that I had to disable the HCatalog build since they're still doing there own thing. Consolidate submodule dependencies using ivy inheritance Key: HIVE-4200 URL: https://issues.apache.org/jira/browse/HIVE-4200 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4200.1.patch.txt, HIVE-4200.2.patch As discussed in 4187: For easier maintenance of ivy dependencies across submodules: Create parent ivy file with consolidated dependencies and include into submodules via inheritance. This way we're not relying on transitive dependencies, but also have the dependencies in a single place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4284) Implement class for vectorized row batch
[ https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4284. Resolution: Fixed Committed to vectorization branch. Thanks, Eric! Thanks Jitendra and Carl for reviewing! Implement class for vectorized row batch Key: HIVE-4284 URL: https://issues.apache.org/jira/browse/HIVE-4284 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch Vectorized row batch object will represent the row batch that vectorized operators will work on. Refer to design spec attached to HIVE-4160 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4103) Remove System.gc() call from the map-join local-task loop
[ https://issues.apache.org/jira/browse/HIVE-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635782#comment-13635782 ] Gunther Hagleitner commented on HIVE-4103: -- I took some time to test out the two versions of the code. I ran a number of mapjoins ranging from small to at the limit and finally over the limit. In summary: Without the gc calls we overestimate the used memory very slightly. The biggest one I've seen is ~1%. The errors btw always cause the estimates to be more conservative, never less. The performance benefit on the other hand is quite substantial: On that large run it went from 120s to 56s with Gopals patch. I think we should move forward with this. Largest run: With Patch: {noformat} 2013-04-18 05:29:36 Starting to launch local task to process map join; maximum memory = 1065484288 2013-04-18 05:29:42 Processing rows:20 Hashtable size: 19 Memory usage: 108807528 rate: 0.102 2013-04-18 05:29:44 Processing rows:30 Hashtable size: 29 Memory usage: 158575416 rate: 0.149 2013-04-18 05:29:46 Processing rows:40 Hashtable size: 39 Memory usage: 211033848 rate: 0.198 2013-04-18 05:29:48 Processing rows:50 Hashtable size: 49 Memory usage: 260673400 rate: 0.245 2013-04-18 05:29:50 Processing rows:60 Hashtable size: 59 Memory usage: 310156256 rate: 0.291 2013-04-18 05:29:53 Processing rows:70 Hashtable size: 69 Memory usage: 359750536 rate: 0.338 2013-04-18 05:29:54 Processing rows:80 Hashtable size: 79 Memory usage: 417989768 rate: 0.392 2013-04-18 05:29:57 Processing rows:90 Hashtable size: 89 Memory usage: 460568536 rate: 0.432 2013-04-18 05:29:58 Processing rows:100 Hashtable size: 99 Memory usage: 510475320 rate: 0.479 2013-04-18 05:30:01 Processing rows:110 Hashtable size: 109 Memory usage: 559513584 rate: 0.525 2013-04-18 05:30:03 Processing rows:120 Hashtable size: 119 Memory usage: 609277088 rate: 0.572 2013-04-18 05:30:06 Processing rows:130 Hashtable size: 129 Memory usage: 659366968 rate: 0.619 2013-04-18 05:30:07 Processing rows:140 Hashtable size: 139 Memory usage: 708744832 rate: 0.665 2013-04-18 05:30:08 Processing rows:150 Hashtable size: 149 Memory usage: 758335688 rate: 0.712 2013-04-18 05:30:13 Processing rows:160 Hashtable size: 159 Memory usage: 825625224 rate: 0.775 2013-04-18 05:30:14 Processing rows:1646400 Hashtable size: 1646400 Memory usage: 848652056 rate: 0.796 2013-04-18 05:30:14 Dump the hashtable into file: file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable 2013-04-18 05:30:32 Upload 1 File to: file:/tmp/hrt_qa/hive_2013-04-18_17-29-31_517_1864473086199137375/-local-10002/HashTable-Stage-1/MapJoin-cd-11--.hashtable File size: 127593266 2013-04-18 05:30:32 End of local task; Time Taken: 56.264 sec. {noformat} Without patch: {noformat} 2013-04-18 05:55:22 Starting to launch local task to process map join; maximum memory = 1065484288 2013-04-18 05:55:29 Processing rows:20 Hashtable size: 19 Memory usage: 108779608 rate: 0.102 2013-04-18 05:55:33 Processing rows:30 Hashtable size: 29 Memory usage: 157203744 rate: 0.148 2013-04-18 05:55:37 Processing rows:40 Hashtable size: 39 Memory usage: 208667552 rate: 0.196 2013-04-18 05:55:42 Processing rows:50 Hashtable size: 49 Memory usage: 258126352 rate: 0.242 2013-04-18 05:55:46 Processing rows:60 Hashtable size: 59 Memory usage: 307734104 rate: 0.289 2013-04-18 05:55:51 Processing rows:70 Hashtable size: 69 Memory usage: 357043768 rate: 0.335 2013-04-18 05:55:57 Processing rows:80 Hashtable size: 79 Memory usage: 415059928 rate: 0.39 2013-04-18 05:56:04 Processing rows:90 Hashtable size: 89 Memory usage: 460135344 rate: 0.432 2013-04-18 05:56:10 Processing rows:100 Hashtable size: 99 Memory usage: 509690176 rate: 0.478 2013-04-18 05:56:18 Processing rows:110 Hashtable size: 109 Memory usage: 559042448 rate: 0.525 2013-04-18 05:56:25 Processing rows:120 Hashtable size: 119 Memory usage: 608652728 rate: 0.571 2013-04-18 05:56:33 Processing rows:130 Hashtable
[jira] [Assigned] (HIVE-4305) Use a single system for dependency resolution
[ https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reassigned HIVE-4305: Assignee: Carl Steinbach Use a single system for dependency resolution - Key: HIVE-4305 URL: https://issues.apache.org/jira/browse/HIVE-4305 Project: Hive Issue Type: Improvement Components: Build Infrastructure, HCatalog Reporter: Travis Crawford Assignee: Carl Steinbach Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy for dependency resolution while HCatalog uses maven-ant-tasks. With the project merge we should converge on a single tool for dependency resolution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4266) Refactor HCatalog code to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-4266: - Priority: Blocker (was: Major) Refactor HCatalog code to org.apache.hive.hcatalog -- Key: HIVE-4266 URL: https://issues.apache.org/jira/browse/HIVE-4266 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.11.0 Currently HCatalog code is in packages org.apache.hcatalog. It needs to now move to org.apache.hive.hcatalog. Shell classes/interface need to be created for public facing classes so that user's code does not break. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2055) Hive HBase Integration issue
[ https://issues.apache.org/jira/browse/HIVE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated HIVE-2055: - Attachment: HIVE-2055.patch HIVE-2055.patch fixes bin/hive script to include hbase and hcat lib's into the classpath Hive HBase Integration issue Key: HIVE-2055 URL: https://issues.apache.org/jira/browse/HIVE-2055 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: sajith v Attachments: HIVE-2055.patch Created an external table in hive , which points to the HBase table. When tried to query a column using the column name in select clause got the following exception : ( java.lang.ClassNotFoundException: org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat), errorCode:12, SQLState:42000) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4266) Refactor HCatalog code to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635805#comment-13635805 ] Carl Steinbach commented on HIVE-4266: -- Marking this a blocker for 0.11.0. bq. Shell classes/interface need to be created for public facing classes so that user's code does not break. I think this will create more problems than it fixes. Permanently supporting these shell classes will be a long term maintenance burden and headache for all involved. The other option is to add them temporarily, but what does that really accomplish? I think for most folks upgrading to the new namespace should be as simple as running this command on their source tree: {noformat} % perl -p -i.bak -e 's|org\.apache\.hcatalog|org.apache.hive.hcatalog|g' `find . -name *.java` {noformat} Refactor HCatalog code to org.apache.hive.hcatalog -- Key: HIVE-4266 URL: https://issues.apache.org/jira/browse/HIVE-4266 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.11.0 Currently HCatalog code is in packages org.apache.hcatalog. It needs to now move to org.apache.hive.hcatalog. Shell classes/interface need to be created for public facing classes so that user's code does not break. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4284) Implement class for vectorized row batch
[ https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-4284: - Labels: VectorEngine (was: ) Implement class for vectorized row batch Key: HIVE-4284 URL: https://issues.apache.org/jira/browse/HIVE-4284 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson Labels: VectorEngine Attachments: HIVE-4284.3.patch, HIVE-4284.4.patch, HIVE-4284.5.patch Vectorized row batch object will represent the row batch that vectorized operators will work on. Refer to design spec attached to HIVE-4160 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4357) BeeLine tests are not getting executed
[ https://issues.apache.org/jira/browse/HIVE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635816#comment-13635816 ] Rob Weltman commented on HIVE-4357: --- Test udf7 in ql (probably one of the last ones in ql) fails for me because of conflicting junit versions in the CLASSPATH. I had to do a little trickery to get past that but could then verify that the beeline tests are executed and pass. BeeLine tests are not getting executed -- Key: HIVE-4357 URL: https://issues.apache.org/jira/browse/HIVE-4357 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Carl Steinbach Assignee: Rob Weltman Fix For: 0.11.0 Attachments: HIVE-4357.1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4357) BeeLine tests are not getting executed
[ https://issues.apache.org/jira/browse/HIVE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635818#comment-13635818 ] Carl Steinbach commented on HIVE-4357: -- +1. Will commit if tests pass. BeeLine tests are not getting executed -- Key: HIVE-4357 URL: https://issues.apache.org/jira/browse/HIVE-4357 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Carl Steinbach Assignee: Rob Weltman Fix For: 0.11.0 Attachments: HIVE-4357.1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4266) Refactor HCatalog code to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635827#comment-13635827 ] Alan Gates commented on HIVE-4266: -- We cannot make this kind of backwards incompatible change for users. Users will not see this as here, run this script against your source tree. They'll see it as they have to go modify, re-test, and re-deploy every application. We should not make this a blocker for 0.11. I'm 90% of the way through the patch, but it will take a fair amount of testing when I'm done to asure that it works with both org.apache.hcatalog and org.apache.hive.hcatalog. Refactor HCatalog code to org.apache.hive.hcatalog -- Key: HIVE-4266 URL: https://issues.apache.org/jira/browse/HIVE-4266 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.11.0 Currently HCatalog code is in packages org.apache.hcatalog. It needs to now move to org.apache.hive.hcatalog. Shell classes/interface need to be created for public facing classes so that user's code does not break. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4305) Use a single system for dependency resolution
[ https://issues.apache.org/jira/browse/HIVE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635831#comment-13635831 ] Carl Steinbach commented on HIVE-4305: -- Owen, I think you just won the debate! In Ant you have to type -Doffline=true. I tried figuring out how many extra characters that is but kept losing count. Use a single system for dependency resolution - Key: HIVE-4305 URL: https://issues.apache.org/jira/browse/HIVE-4305 Project: Hive Issue Type: Improvement Components: Build Infrastructure, HCatalog Reporter: Travis Crawford Assignee: Carl Steinbach Both Hive and HCatalog use ant as their build tool. However, Hive uses ivy for dependency resolution while HCatalog uses maven-ant-tasks. With the project merge we should converge on a single tool for dependency resolution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4200) Consolidate submodule dependencies using ivy inheritance
[ https://issues.apache.org/jira/browse/HIVE-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635833#comment-13635833 ] Gunther Hagleitner commented on HIVE-4200: -- Thanks [~cwsteinbach]. Just saw this. I'll take a look tonight. Feel free to take over the jira, if it seems I am becoming the bottleneck on this. Consolidate submodule dependencies using ivy inheritance Key: HIVE-4200 URL: https://issues.apache.org/jira/browse/HIVE-4200 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4200.1.patch.txt, HIVE-4200.2.patch As discussed in 4187: For easier maintenance of ivy dependencies across submodules: Create parent ivy file with consolidated dependencies and include into submodules via inheritance. This way we're not relying on transitive dependencies, but also have the dependencies in a single place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635852#comment-13635852 ] Dheeraj Kumar Singh commented on HIVE-4095: --- HIVE-4095.part12.patch.txt and HIVE-4095.part11.patch.txt are the two relevant patches. HIVE-4095.part11.patch.txt has the diff that was put up and HIVE-4095.part12.patch.txt has the thrift changes. Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4282) Implement vectorized column-scalar expressions
[ https://issues.apache.org/jira/browse/HIVE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4282: --- Attachment: HIVE-4282.2.patch Uploaded a new patch fixing style issues. Also uploaded on the review board. https://reviews.apache.org/r/10608/ Implement vectorized column-scalar expressions -- Key: HIVE-4282 URL: https://issues.apache.org/jira/browse/HIVE-4282 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4282.1.patch, HIVE-4282.2.patch Implement arithmetic expressions involving a column and a scalar with column as first argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635922#comment-13635922 ] Ashutosh Chauhan commented on HIVE-4318: +1 will commit if tests pass. OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.3.patch, HIVE-4318.patch.pam.txt Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4371) some issue with merging join trees
[ https://issues.apache.org/jira/browse/HIVE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635934#comment-13635934 ] Navis commented on HIVE-4371: - [~namit] Ran all tests and passed. I cannot see anything wrong in aliases of QBJoinTree. {noformat} TS1(b)-RS1\ TS2(c)-RS2-JOIN1-RS4\ TS3(a)-RS3/ JOIN2 TS4(d)-RS5/ JOIN2 (L=null, R=d, Ls=[a,b,c], Base=d) {noformat} In this, posBig should be 0(d) or 1(null), not 2(c) in other join context. some issue with merging join trees -- Key: HIVE-4371 URL: https://issues.apache.org/jira/browse/HIVE-4371 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Namit Jain Assignee: Navis Attachments: HIVE-4371.D10323.1.patch [~navis], I would really appreciate if you can take a look. I am attaching a testcase, for which in the optimizer the join context left aliases and right aliases do not look correct. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4095) Add exchange partition in Hive
[ https://issues.apache.org/jira/browse/HIVE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4095: - Attachment: hive.4095.1.patch Add exchange partition in Hive -- Key: HIVE-4095 URL: https://issues.apache.org/jira/browse/HIVE-4095 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Dheeraj Kumar Singh Attachments: hive.4095.1.patch, HIVE-4095.D10155.1.patch, HIVE-4095.D10155.2.patch, HIVE-4095.D10347.1.patch, HIVE-4095.part11.patch.txt, HIVE-4095.part12.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2817) Drop any table even without privilege
[ https://issues.apache.org/jira/browse/HIVE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2817: -- Attachment: HIVE-2817.D10371.1.patch chenchun requested code review of HIVE-2817 [jira] Drop any table even without privilege. Reviewers: JIRA HIVE-2817 Drop any table even without privilege You can drop any table if you use fully qualified name 'database.table' even you don't have any previlige. hive set hive.security.authorization.enabled=true; hive revoke all on default from user test_user; hive drop table abc; hive drop table abc; Authorization failed:No privilege 'Drop' found for outputs { database:default, table:abc}. Use show grant to get more details. hive drop table default.abc; OK Time taken: 0.13 seconds The table and the file in /usr/hive/warehouse or external file will be deleted. If you don't have hadoop access permission on /usr/hive/warehouse or external files, you will see a hadoop access error 12/02/23 15:35:35 ERROR hive.log: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=test_user, access=WRITE, inode=/user/myetl:myetl:etl:drwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D10371 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java ql/src/test/queries/clientnegative/authorization_fail_8.q ql/src/test/queries/clientpositive/authorization_8.q ql/src/test/results/clientnegative/authorization_fail_8.q.out ql/src/test/results/clientpositive/authorization_8.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/24831/ To: JIRA, chenchun Drop any table even without privilege - Key: HIVE-2817 URL: https://issues.apache.org/jira/browse/HIVE-2817 Project: Hive Issue Type: Bug Affects Versions: 0.7.1 Reporter: Benyi Wang Attachments: HIVE-2817.D10371.1.patch You can drop any table if you use fully qualified name 'database.table' even you don't have any previlige. {code} hive set hive.security.authorization.enabled=true; hive revoke all on default from user test_user; hive drop table abc; hive drop table abc; Authorization failed:No privilege 'Drop' found for outputs { database:default, table:abc}. Use show grant to get more details. hive drop table default.abc; OK Time taken: 0.13 seconds {code} The table and the file in {{/usr/hive/warehouse}} or external file will be deleted. If you don't have hadoop access permission on {{/usr/hive/warehouse}} or external files, you will see a hadoop access error {code} 12/02/23 15:35:35 ERROR hive.log: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=test_user, access=WRITE, inode=/user/myetl:myetl:etl:drwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2817) Drop any table even without privilege
[ https://issues.apache.org/jira/browse/HIVE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Chun reassigned HIVE-2817: --- Assignee: Chen Chun Drop any table even without privilege - Key: HIVE-2817 URL: https://issues.apache.org/jira/browse/HIVE-2817 Project: Hive Issue Type: Bug Affects Versions: 0.7.1 Reporter: Benyi Wang Assignee: Chen Chun Attachments: HIVE-2817.D10371.1.patch You can drop any table if you use fully qualified name 'database.table' even you don't have any previlige. {code} hive set hive.security.authorization.enabled=true; hive revoke all on default from user test_user; hive drop table abc; hive drop table abc; Authorization failed:No privilege 'Drop' found for outputs { database:default, table:abc}. Use show grant to get more details. hive drop table default.abc; OK Time taken: 0.13 seconds {code} The table and the file in {{/usr/hive/warehouse}} or external file will be deleted. If you don't have hadoop access permission on {{/usr/hive/warehouse}} or external files, you will see a hadoop access error {code} 12/02/23 15:35:35 ERROR hive.log: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=test_user, access=WRITE, inode=/user/myetl:myetl:etl:drwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
[ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636073#comment-13636073 ] Phabricator commented on HIVE-2340: --- navis has commented on the revision HIVE-2340 [jira] optimize orderby followed by a groupby. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:122 When ScriptOperator exists between RSs, it might possible to dedup only if the script does not change schema, order of rows and values of the RS related columns. It seemed added for that case by He Yongqiang, initial developer of this optimizer. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:103 Added comments ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:359 ok. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:181 done. REVISION DETAIL https://reviews.facebook.net/D1209 BRANCH DPAL-592 ARCANIST PROJECT hive To: JIRA, hagleitn, navis Cc: hagleitn, njain optimize orderby followed by a groupby -- Key: HIVE-2340 URL: https://issues.apache.org/jira/browse/HIVE-2340 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: perfomance Fix For: 0.11.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, HIVE-2340.13.patch, HIVE-2340.14.patch, HIVE-2340.14.rebased_and_schema_clone.patch, HIVE-2340.15.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, HIVE-2340.D1209.13.patch, HIVE-2340.D1209.14.patch, HIVE-2340.D1209.15.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by following group-by). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
[ https://issues.apache.org/jira/browse/HIVE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4377: -- Attachment: HIVE-4377.D10377.1.patch navis requested code review of HIVE-4377 [jira] Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340). Reviewers: JIRA HIVE-4377 Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340) thanks a lot for addressing optimization in HIVE-2340. Awesome! Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Highlights which are applicable for D1209: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Comments in each test (.q file) that should include the jira number, what is it trying to test. Assumptions about each query. 5. Reduce the output for each test whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. thanks a lot TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D10377 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/24849/ To: JIRA, navis Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340) -- Key: HIVE-4377 URL: https://issues.apache.org/jira/browse/HIVE-4377 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gang Tim Liu Assignee: Navis Attachments: HIVE-4377.D10377.1.patch thanks a lot for addressing optimization in HIVE-2340. Awesome! Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Highlights which are applicable for D1209: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Comments in each test (.q file) that should include the jira number, what is it trying to test. Assumptions about each query. 5. Reduce the output for each test whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
[ https://issues.apache.org/jira/browse/HIVE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4377: Status: Patch Available (was: Open) Initial comments for review Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340) -- Key: HIVE-4377 URL: https://issues.apache.org/jira/browse/HIVE-4377 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gang Tim Liu Assignee: Navis Attachments: HIVE-4377.D10377.1.patch thanks a lot for addressing optimization in HIVE-2340. Awesome! Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Highlights which are applicable for D1209: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Comments in each test (.q file) that should include the jira number, what is it trying to test. Assumptions about each query. 5. Reduce the output for each test whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4278) HCat needs to get current Hive jars instead of pulling them from maven repo
[ https://issues.apache.org/jira/browse/HIVE-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636103#comment-13636103 ] Hudson commented on HIVE-4278: -- Integrated in Hive-trunk-h0.21 #2070 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2070/]) HIVE-4278 : HCat needs to get current Hive jars instead of pulling them from maven repo (Sushanth Sowmyan via Ashutosh Chauhan) (Revision 1469348) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1469348 Files : * /hive/trunk/beeline/ivy.xml * /hive/trunk/build-common.xml * /hive/trunk/build.properties * /hive/trunk/cli/ivy.xml * /hive/trunk/hcatalog/build-support/ant/deploy.xml * /hive/trunk/hcatalog/build.properties * /hive/trunk/hcatalog/core/pom.xml * /hive/trunk/hcatalog/hcatalog-pig-adapter/pom.xml * /hive/trunk/hcatalog/pom.xml * /hive/trunk/hcatalog/server-extensions/pom.xml * /hive/trunk/hcatalog/storage-handlers/hbase/pom.xml * /hive/trunk/hcatalog/webhcat/java-client/pom.xml * /hive/trunk/hcatalog/webhcat/svr/pom.xml * /hive/trunk/hwi/ivy.xml * /hive/trunk/ql/build.xml * /hive/trunk/ql/ivy.xml HCat needs to get current Hive jars instead of pulling them from maven repo --- Key: HIVE-4278 URL: https://issues.apache.org/jira/browse/HIVE-4278 Project: Hive Issue Type: Sub-task Components: Build Infrastructure, HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Sushanth Sowmyan Priority: Blocker Fix For: 0.11.0 Attachments: HIVE-4278.approach2.patch, HIVE-4278.approach2.patch.2.for.branch.11, HIVE-4278.approach2.patch.2.for.branch.12, HIVE-4278.approach2.patch.3.for.branch.12, HIVE-4278.D10257.1.patch, HIVE-4278.D9981.1.patch The HCatalog build is currently pulling Hive jars from the maven repo instead of using the ones built as part of the current build. Now that it is part of Hive it should use the jars being built instead of pulling them from maven. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4342) NPE for query involving UNION ALL with nested JOIN and UNION ALL
[ https://issues.apache.org/jira/browse/HIVE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636106#comment-13636106 ] Navis commented on HIVE-4342: - I've tried with trunk and got various exceptions. with default configuration, {noformat} org.apache.hadoop.hive.ql.parse.SemanticException: Big Table Alias is null at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinLocalWork(MapJoinProcessor.java:217) at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:232) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.convertTaskToMapJoinTask(CommonJoinResolver.java:245) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:372) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:553) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:112) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8387) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) org.apache.hadoop.hive.ql.parse.SemanticException: Generate New MapJoin Opertor Exeception Big Table Alias is null at org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:242) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.convertTaskToMapJoinTask(CommonJoinResolver.java:245) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:372) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:553) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:112) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8387) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at