[jira] [Created] (HIVE-4569) GetQueryPlan api in Hive Server2
Amareshwari Sriramadasu created HIVE-4569: - Summary: GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing
Amareshwari Sriramadasu created HIVE-4570: - Summary: More information to user on GetOperationStatus in Hive Server2 when query is still executing Key: HIVE-4570 URL: https://issues.apache.org/jira/browse/HIVE-4570 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Currently in Hive Server2, when the query is still executing only the status is set as STILL_EXECUTING. This issue is to give more information to the user such as progress and running job handles, if possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: admin rights for Hive jira
Done. I also made sure that all of the other Hive committers have admin privileges. Thanks. Carl On Wed, May 15, 2013 at 8:59 PM, Owen O'Malley omal...@apache.org wrote: Carl, Please give me admin rights in Hive's jira so that I can close the 0.11.0 release and create the 0.11.1 target as described in the HowToRelease wiki page (https://cwiki.apache.org/Hive/howtorelease.html. I'm a jira admin on 11 other Apache projects and I believe I created the Hive jira originally. *smile* Thanks, Owen
[jira] [Commented] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions
[ https://issues.apache.org/jira/browse/HIVE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659363#comment-13659363 ] Hudson commented on HIVE-4550: -- Integrated in Hive-trunk-h0.21 #2105 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2105/]) HIVE-4550 local_mapred_error_cache fails on some hadoop versions (Gunther Hagleitner via omalley) (Revision 1483124) Result = FAILURE omalley : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483124 Files : * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java * /hive/trunk/ql/src/test/results/clientnegative/local_mapred_error_cache.q.out local_mapred_error_cache fails on some hadoop versions -- Key: HIVE-4550 URL: https://issues.apache.org/jira/browse/HIVE-4550 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4550.1.patch I've tested it manually on the upcoming 1.3 version (branch 1). We do mask job_* ids, but not job_local* ids. The fix is to extend this to both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999
[ https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659362#comment-13659362 ] Hudson commented on HIVE-4440: -- Integrated in Hive-trunk-h0.21 #2105 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2105/]) HIVE-4440 SMB Operator spills to disk like it's 1999 (Gunther Hagleitner via omalley) (Revision 1483084) Result = FAILURE omalley : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483084 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java SMB Operator spills to disk like it's 1999 -- Key: HIVE-4440 URL: https://issues.apache.org/jira/browse/HIVE-4440 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.12.0 Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch I was recently looking into some performance issue with a query that used SMB join and was running really slow. Turns out that the SMB join by default caches only 100 values per key before spilling to disk. That seems overly conservative to me. Changing the parameter resulted in a ~5x speedup - quite significant. The parameter is: hive.mapjoin.bucket.cache.size Which right now is only used the SMB Operator as far as I can tell. The parameter was introduced originally (3 yrs ago) for the map join operator (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in a different context though where you had to avoid running out of memory with the cached hash table in the same process, I think. Two things I'd like to propose: a) Rename it to what it does: hive.smbjoin.cache.rows b) Set it to something less restrictive: 1 If you string together a 5 table smb join with a map join and a map-side group by aggregation you might still run out of memory, but the renamed parameter should be easier to find and reduce. For most queries, I would think that 1 is still a reasonable number to cache (On the reduce side we use 25000 for shuffle joins). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 2105 - Still Failing
Changes for Build #2078 [namit] HIVE-4409 Prevent incompatible column type changes (Dilip Joseph via namit) [namit] HIVE-4095 Add exchange partition in Hive (Dheeraj Kumar Singh via namit) [namit] HIVE-4005 Column truncation (Kevin Wilfong via namit) [namit] HIVE-3952 merge map-job followed by map-reduce job (Vinod Kumar Vavilapalli via namit) [hashutosh] HIVE-4412 : PTFDesc tries serialize transient fields like OIs, etc. (Navis via Ashutosh Chauhan) [khorgath] HIVE-4419 : webhcat - support ${WEBHCAT_PREFIX}/conf/ as config directory (Thejas M Nair via Sushanth Sowmyan) [namit] HIVE-4181 Star argument without table alias for UDTF is not working (Navis via namit) [hashutosh] HIVE-4407 : TestHCatStorer.testStoreFuncAllSimpleTypes fails because of null case difference (Thejas Nair via Ashutosh Chauhan) [hashutosh] HIVE-4369 : Many new failures on hadoop 2 (Vikram Dixit via Ashutosh Chauhan) Changes for Build #2079 [namit] HIVE-4424 MetaStoreUtils.java.orig checked in mistakenly by HIVE-4409 (Namit Jain) [hashutosh] HIVE-4358 : Check for Map side processing in PTFOp is no longer valid (Harish Butani via Ashutosh Chauhan) Changes for Build #2080 [navis] HIVE-4068 Size of aggregation buffer which uses non-primitive type is not estimated correctly (Navis) [khorgath] HIVE-4420 : HCatalog unit tests stop after a failure (Alan Gates via Sushanth Sowmyan) [hashutosh] HIVE-3708 : Add mapreduce workflow information to job configuration (Billie Rinaldi via Ashutosh Chauhan) Changes for Build #2081 Changes for Build #2082 [hashutosh] HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh Chauhan) [hashutosh] HIVE-4398 : HS2 Resource leak: operation handles not cleaned when originating session is closed (Ashish Vaidya via Ashutosh Chauhan) [hashutosh] HIVE-4019 : Ability to create and drop temporary partition function (Brock Noland via Ashutosh Chauhan) Changes for Build #2083 [navis] HIVE-4437 Missing file on HIVE-4068 (Navis) Changes for Build #2084 Changes for Build #2085 Changes for Build #2086 [hashutosh] HIVE-4350 : support AS keyword for table alias (Matthew Weaver via Ashutosh Chauhan) [hashutosh] HIVE-4439 : Remove unused join configuration parameter: hive.mapjoin.cache.numrows (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4438 : Remove unused join configuration parameter: hive.mapjoin.size.key (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-3682 : when output hive table to file,users should could have a separator of their own choice (Sushanth Sowmyan via Ashutosh Chauhan) [hashutosh] HIVE-4373 : Hive Version returned by HiveDatabaseMetaData.getDatabaseProductVersion is incorrect (Thejas Nair via Ashutosh Chauhan) Changes for Build #2087 Changes for Build #2088 [gates] HIVE-4465 webhcat e2e tests succeed regardless of exitvalue Changes for Build #2089 [cws] HIVE-3957. Add pseudo-BNF grammar for RCFile to Javadoc (Mark Grover via cws) [cws] HIVE-4497. beeline module tests don't get run by default (Thejas Nair via cws) [gangtimliu] HIVE-4474: Column access not tracked properly for partitioned tables. Samuel Yuan via Gang Tim Liu [hashutosh] HIVE-4455 : HCatalog build directories get included in tar file produced by ant tar (Alan Gates via Ashutosh Chauhan) Changes for Build #2090 Changes for Build #2091 [hashutosh] HIVE-4392 : Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4421 : Improve memory usage by ORC dictionaries (Owen Omalley via Ashutosh Chauhan) [mithun] HCATALOG-627 - Adding thread-safety to NotificationListener. (amalakar via mithun) Changes for Build #2092 [hashutosh] HIVE-4466 : Fix continue.on.failure in unit tests to -well- continue on failure in unit tests (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4471 : Build fails with hcatalog checkstyle error (Gunther Hagleitner via Ashutosh Chauhan) Changes for Build #2093 [omalley] HIVE-4494 ORC map columns get class cast exception in some contexts (omalley) [omalley] HIVE-4500 Ensure that HiveServer 2 closes log files. (Alan Gates via omalley) Changes for Build #2094 [navis] HIVE-4209 Cache evaluation result of deterministic expression and reuse it (Navis via namit) Changes for Build #2095 Changes for Build #2096 Changes for Build #2097 [cws] HIVE-4530. Enforce minmum ant version required in build script (Arup Malakar via cws) [omalley] Preparing RELEASE_NOTES for Hive 0.11.0rc2. Changes for Build #2098 [omalley] Update release notes for 0.11.0rc2 [omalley] HIVE-4527 Fix eclipse project template (Carl Steinbach via omalley) [omalley] HIVE-4505 Hive can't load transforms with remote scripts. (Prasad Majumdar and Gunther Hagleitner via omalley) [omalley] HIVE-4498 TestBeeLineWithArgs.testPositiveScriptFile fails (Thejas Nair via omalley) Changes for Build #2099 Changes for Build #2100 Changes for Build #2101 Changes for
[jira] [Commented] (HIVE-4568) Beeline needs to support resolving variables
[ https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659381#comment-13659381 ] caofangkun commented on HIVE-4568: -- env:* and and system:* variables can not be set . And other variables can be set in beeline Already. Beeline needs to support resolving variables Key: HIVE-4568 URL: https://issues.apache.org/jira/browse/HIVE-4568 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Xuefu Zhang Priority: Minor Beeline currently doesn't support variable (system, env, etc) substitution as hive client does. Supporting this feature will certainly make it more usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4552) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly
[ https://issues.apache.org/jira/browse/HIVE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4552: --- Resolution: Fixed Fix Version/s: vectorization-branch Status: Resolved (was: Patch Available) Committed to branch. Thanks, Sarvesh! Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly --- Key: HIVE-4552 URL: https://issues.apache.org/jira/browse/HIVE-4552 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Fix For: vectorization-branch Attachments: Hive.4552.0.patch IsRepeating flag in ColumnVector is being set incorrectly by ORC RecordReader(RecordReaderImpl.java) and as such wrong results are being written by VectorFileSinkOperator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4571) Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe
Harsh J created HIVE-4571: - Summary: Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe Key: HIVE-4571 URL: https://issues.apache.org/jira/browse/HIVE-4571 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Harsh J Priority: Minor HIVE-337 added support for complex data structures and also oddly added in a limit of the # of separator characters required to make that happen. When using an Avro-based table that has more than 8-10 levels of nesting in records, this limit gets hit and such tables can't be queried. We either need to remove such a limit or raise it to a high-enough value to support such nested data structures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4571) Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe
[ https://issues.apache.org/jira/browse/HIVE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659431#comment-13659431 ] Harsh J commented on HIVE-4571: --- A sample change would be: {code} diff --git a/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java b/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java index 0036a8e..252ea6b 100644 --- a/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java +++ b/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java @@ -211,7 +211,7 @@ public class LazySimpleSerDe implements SerDe { // Read the separators: We use 8 levels of separators by default, but we // should change this when we allow users to specify more than 10 levels // of separators through DDL. -serdeParams.separators = new byte[8]; +serdeParams.separators = new byte[32]; serdeParams.separators[0] = getByte(tbl.getProperty(Constants.FIELD_DELIM, tbl.getProperty(Constants.SERIALIZATION_FORMAT)), DefaultSeparators[0]); serdeParams.separators[1] = getByte(tbl {code} Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe --- Key: HIVE-4571 URL: https://issues.apache.org/jira/browse/HIVE-4571 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Harsh J Priority: Minor HIVE-337 added support for complex data structures and also oddly added in a limit of the # of separator characters required to make that happen. When using an Avro-based table that has more than 8-10 levels of nesting in records, this limit gets hit and such tables can't be queried. We either need to remove such a limit or raise it to a high-enough value to support such nested data structures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: admin rights for Hive jira
Hi Carl, As a side question, though not a committer, I expect to be able to assign a JIRA to myself, especially for those created by me, so as to let other people know that the JIRA is being worked on. Is this reasonable? I can see other projects give this privilege to non-committers. Could you please comment? Thanks, Xuefu On Thu, May 16, 2013 at 12:51 AM, Carl Steinbach cwsteinb...@gmail.comwrote: Done. I also made sure that all of the other Hive committers have admin privileges. Thanks. Carl On Wed, May 15, 2013 at 8:59 PM, Owen O'Malley omal...@apache.org wrote: Carl, Please give me admin rights in Hive's jira so that I can close the 0.11.0 release and create the 0.11.1 target as described in the HowToRelease wiki page (https://cwiki.apache.org/Hive/howtorelease.html. I'm a jira admin on 11 other Apache projects and I believe I created the Hive jira originally. *smile* Thanks, Owen
Re: admin rights for Hive jira
While you guys are on the topic, I am on the pmc and I do not think I can edit my own comments. On Thu, May 16, 2013 at 9:50 AM, Xuefu Zhang xzh...@cloudera.com wrote: Hi Carl, As a side question, though not a committer, I expect to be able to assign a JIRA to myself, especially for those created by me, so as to let other people know that the JIRA is being worked on. Is this reasonable? I can see other projects give this privilege to non-committers. Could you please comment? Thanks, Xuefu On Thu, May 16, 2013 at 12:51 AM, Carl Steinbach cwsteinb...@gmail.com wrote: Done. I also made sure that all of the other Hive committers have admin privileges. Thanks. Carl On Wed, May 15, 2013 at 8:59 PM, Owen O'Malley omal...@apache.org wrote: Carl, Please give me admin rights in Hive's jira so that I can close the 0.11.0 release and create the 0.11.1 target as described in the HowToRelease wiki page ( https://cwiki.apache.org/Hive/howtorelease.html. I'm a jira admin on 11 other Apache projects and I believe I created the Hive jira originally. *smile* Thanks, Owen
[jira] [Updated] (HIVE-4554) Failed to create a table from existing file if file path has spaces
[ https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4554: Fix Version/s: (was: 0.11.0) 0.11.1 Failed to create a table from existing file if file path has spaces --- Key: HIVE-4554 URL: https://issues.apache.org/jira/browse/HIVE-4554 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Reporter: Xuefu Zhang Fix For: 0.11.1 Attachments: HIVE-4554.patch, HIVE-4554.patch.1 To reproduce the problem, 1. Create a table, say, person_age (name STRING, age INT). 2. Create a file whose name has a space in it, say, data set.txt. 3. Try to load the date in the file to the table. The following error can be seen in the console: hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age; Loading data to table default.person_age Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask Note: the error message is confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4565) TestCliDriver and TestParse fail with non Sun Java
[ https://issues.apache.org/jira/browse/HIVE-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4565: Fix Version/s: (was: 0.11.0) 0.11.1 TestCliDriver and TestParse fail with non Sun Java -- Key: HIVE-4565 URL: https://issues.apache.org/jira/browse/HIVE-4565 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.11.0 Environment: RedHat x86 IBM Java 6 Reporter: Renata Ghisloti Duarte de Souza Priority: Minor Fix For: 0.11.1 Attachments: HIVE-4565.patch While executing Hive's unit tests two testcases have different outputs with Sun Java and non-Sun Java (such as IBM): TestCliDriver and TestParse. The differences are mainly due to the use of HashMaps on the creation of the Logical Plan on analyzeInternal method. Sun java presents the elements of a HashMap in one order, and non sun Java on a different order. Both outputs are correct, and don't affect the final query result. I propose this patch attached to make Hive unit tests compliant with all JVMs. The patch adds the output files and a change on ql/build.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: admin rights for Hive jira
On Thu, May 16, 2013 at 6:50 AM, Xuefu Zhang xzh...@cloudera.com wrote: Hi Carl, As a side question, though not a committer, I expect to be able to assign a JIRA to myself, especially for those created by me, so as to let other people know that the JIRA is being worked on. Is this reasonable? I've added you to the contributors list. You should be able to assign jiras to yourself. -- Owen I can see other projects give this privilege to non-committers. Could you please comment? Thanks, Xuefu On Thu, May 16, 2013 at 12:51 AM, Carl Steinbach cwsteinb...@gmail.com wrote: Done. I also made sure that all of the other Hive committers have admin privileges. Thanks. Carl On Wed, May 15, 2013 at 8:59 PM, Owen O'Malley omal...@apache.org wrote: Carl, Please give me admin rights in Hive's jira so that I can close the 0.11.0 release and create the 0.11.1 target as described in the HowToRelease wiki page ( https://cwiki.apache.org/Hive/howtorelease.html. I'm a jira admin on 11 other Apache projects and I believe I created the Hive jira originally. *smile* Thanks, Owen
Re: admin rights for Hive jira
On Thu, May 16, 2013 at 7:58 AM, Edward Capriolo edlinuxg...@gmail.comwrote: While you guys are on the topic, I am on the pmc and I do not think I can edit my own comments. With the Hadoop jira security model, which is what Hive's jira uses, only admins can edit comments. Now that Carl granted you admin rights, you can edit comments. Use it judiciously, especially if others have responded to your comment since it can make the conversation difficult to follow. -- Owen
[jira] [Assigned] (HIVE-4566) NullPointerException if typeinfo and nativesql commands are executed at beeline before a DB connection is established
[ https://issues.apache.org/jira/browse/HIVE-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-4566: - Assignee: Xuefu Zhang NullPointerException if typeinfo and nativesql commands are executed at beeline before a DB connection is established - Key: HIVE-4566 URL: https://issues.apache.org/jira/browse/HIVE-4566 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Before a DB connection is established, executing a command such as typeinfo and nativesql results an NPE shown at the console: beeline !typeinfo java.lang.NullPointerException beeline !nativesql java.lang.NullPointerException Instead, a message, such as No current connection should be given, as in case of some other commands, such as dropall. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4568) Beeline needs to support resolving variables
[ https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-4568: - Assignee: Xuefu Zhang Beeline needs to support resolving variables Key: HIVE-4568 URL: https://issues.apache.org/jira/browse/HIVE-4568 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Beeline currently doesn't support variable (system, env, etc) substitution as hive client does. Supporting this feature will certainly make it more usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4554) Failed to create a table from existing file if file path has spaces
[ https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-4554: - Assignee: Xuefu Zhang Failed to create a table from existing file if file path has spaces --- Key: HIVE-4554 URL: https://issues.apache.org/jira/browse/HIVE-4554 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.11.1 Attachments: HIVE-4554.patch, HIVE-4554.patch.1 To reproduce the problem, 1. Create a table, say, person_age (name STRING, age INT). 2. Create a file whose name has a space in it, say, data set.txt. 3. Try to load the date in the file to the table. The following error can be seen in the console: hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age; Loading data to table default.person_age Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask Note: the error message is confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Column Column, and Column Scalar vectorized execution tests
On May 14, 2013, 6:13 p.m., Eric Hanson wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt, line 1 https://reviews.apache.org/r/11133/diff/2/?file=291141#file291141line1 test .txt templates need apache license the generated class will have the apache license (see testclass.txt) but if I add it here then every test case would have it, which would bloat the class with tons of comments. is it necessary? or is there another way to specify it? like maybe a license.txt in the directory? - tony --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11133/#review20538 --- On May 14, 2013, 12:34 a.m., tony murphy wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11133/ --- (Updated May 14, 2013, 12:34 a.m.) Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and Remus Rusanu. Description --- This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. This addresses bug HIVE-4553. https://issues.apache.org/jira/browse/HIVE-4553 Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java 53d9a7a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java 8a07567 Diff: https://reviews.apache.org/r/11133/diff/ Testing --- generated tests, and ran them. Thanks, tony murphy
[jira] [Commented] (HIVE-4472) OR, NOT Filter logic can lose an array, and always takes time O(VectorizedRowBatch.DEFAULT_SIZE)
[ https://issues.apache.org/jira/browse/HIVE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659757#comment-13659757 ] Eric Hanson commented on HIVE-4472: --- I posted additional comments to review board. Patch is almost ready but not quite. Expect one more update from Jitendra. OR, NOT Filter logic can lose an array, and always takes time O(VectorizedRowBatch.DEFAULT_SIZE) Key: HIVE-4472 URL: https://issues.apache.org/jira/browse/HIVE-4472 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Attachments: HIVE-4472.1.patch, HIVE-4472.2.patch, HIVE-4472.3.patch The issue is in file FilterExprOrExpr.java and FilterNotExpr.java. I posted a review for you at https://reviews.apache.org/r/10752/ I think there is a bug related to sharing of an array of integers. Also, one algorithm step takes O(DEFAULT_BATCH_SIZE) time always. If nDEFAULT_BATCH_SIZE then this is a performance issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions
[ https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659762#comment-13659762 ] Owen O'Malley commented on HIVE-4486: - +1 can you run the unit tests? FetchOperator slows down SMB map joins by 50% when there are many partitions Key: HIVE-4486 URL: https://issues.apache.org/jira/browse/HIVE-4486 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: Ubuntu LXC 12.10 Reporter: Gopal V Priority: Minor Attachments: HIVE-4486.patch, smb-profile.html While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. || ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was {code}INSERT OVERWRITE LOCAL DIRECTORY '/grid/0/smb/' select inv_item_sk from inventory inv join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk) limit 10 ; {code} On a scale=2 tpcds data-set, where both store_sales inventory are bucketed into 4 buckets, with store_sales split into 7 partitions and inventory into 261 partitions. 78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4572) ColumnPruner cannot preserve RS key columns in columnExprMap
Yin Huai created HIVE-4572: -- Summary: ColumnPruner cannot preserve RS key columns in columnExprMap Key: HIVE-4572 URL: https://issues.apache.org/jira/browse/HIVE-4572 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai For a RS of a join operator, if the join key corresponding to this RS does not appear in the SELECT clause, ColumnPruner will drop the entry of this column in colExprMap. Example: {code} SELECT x.key FROM src1 x JOIN src y ON (x.key = y.key); {\code} Before CP, {code} colExprMap of RS corresponding to x: {VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], VALUE._col0=Column[key]}; colExprMap of RS corresponding to y: {VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], VALUE._col0=Column[key]}. {\code} After CP, {code} colExprMap of RS corresponding to x: {VALUE._col0=Column[key]}; colExprMap of RS corresponding to y: {}. {\code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4572) ColumnPruner cannot preserve RS key columns in columnExprMap
[ https://issues.apache.org/jira/browse/HIVE-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4572: --- Attachment: HIVE-4572.replay.patch To see the problem, you can apply the patch HIVE-4572.replay.patch, and execute {code} ant test -Dtestcase=TestCliDriver -Dqfile=RSKeyLostAfterCP.q {\code}. Then you can search In pruneReduceSinkOperator oldMap and In pruneReduceSinkOperator newMap in build/ql/tmp/hive.log to see the problem. ColumnPruner cannot preserve RS key columns in columnExprMap Key: HIVE-4572 URL: https://issues.apache.org/jira/browse/HIVE-4572 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4572.replay.patch For a RS of a join operator, if the join key corresponding to this RS does not appear in the SELECT clause, ColumnPruner will drop the entry of this column in colExprMap. Example: {code} SELECT x.key FROM src1 x JOIN src y ON (x.key = y.key); {\code} Before CP, {code} colExprMap of RS corresponding to x: {VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], VALUE._col0=Column[key]}; colExprMap of RS corresponding to y: {VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], VALUE._col0=Column[key]}. {\code} After CP, {code} colExprMap of RS corresponding to x: {VALUE._col0=Column[key]}; colExprMap of RS corresponding to y: {}. {\code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4572) ColumnPruner cannot preserve RS key columns corresponding to un-selected join keys in columnExprMap
[ https://issues.apache.org/jira/browse/HIVE-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4572: --- Summary: ColumnPruner cannot preserve RS key columns corresponding to un-selected join keys in columnExprMap (was: ColumnPruner cannot preserve RS key columns in columnExprMap) ColumnPruner cannot preserve RS key columns corresponding to un-selected join keys in columnExprMap --- Key: HIVE-4572 URL: https://issues.apache.org/jira/browse/HIVE-4572 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4572.replay.patch For a RS of a join operator, if the join key corresponding to this RS does not appear in the SELECT clause, ColumnPruner will drop the entry of this column in colExprMap. Example: {code} SELECT x.key FROM src1 x JOIN src y ON (x.key = y.key); {\code} Before CP, {code} colExprMap of RS corresponding to x: {VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], VALUE._col0=Column[key]}; colExprMap of RS corresponding to y: {VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], VALUE._col0=Column[key]}. {\code} After CP, {code} colExprMap of RS corresponding to x: {VALUE._col0=Column[key]}; colExprMap of RS corresponding to y: {}. {\code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters
[ https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659782#comment-13659782 ] Chu Tong commented on HIVE-4403: Can someone please help me to review this code? Thanks Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters Key: HIVE-4403 URL: https://issues.apache.org/jira/browse/HIVE-4403 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mark Grover Attachments: HIVE-4403.patch While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings related to overriding final parameters in job.conf. This was on a pseudo distributed cluster. FWIW, I didn't see this happen on a fully-distributed cluster. Perhaps, Hive's job.conf is overriding some final parameters it shouldn't. Here is what the warnings looked like: {code} 2013-04-19 14:20:32,304 WARN [main] conf.Configuration (Configuration.java:loadProperty(2032)) - file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2013-04-19 14:20:32,367 WARN [main] conf.Configuration (Configuration.java:loadProperty(2032)) - file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. {code} To reproduce, run a query like: {code} CREATE TABLE u_data ( userid INT, movieid INT, rating INT, unixtime STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; {code} Load some data into u_data, here is some sample data: https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data Run a simple query on that data (on YARN/MR2) {code} INSERT OVERWRITE DIRECTORY '/tmp/count' SELECT COUNT(1) FROM u_data {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions
[ https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659785#comment-13659785 ] Gopal V commented on HIVE-4486: --- I have already run all of tests in ql/ against svn (Wed May 8) already. Will run all tests in some time report back. FetchOperator slows down SMB map joins by 50% when there are many partitions Key: HIVE-4486 URL: https://issues.apache.org/jira/browse/HIVE-4486 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: Ubuntu LXC 12.10 Reporter: Gopal V Priority: Minor Attachments: HIVE-4486.patch, smb-profile.html While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. || ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was {code}INSERT OVERWRITE LOCAL DIRECTORY '/grid/0/smb/' select inv_item_sk from inventory inv join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk) limit 10 ; {code} On a scale=2 tpcds data-set, where both store_sales inventory are bucketed into 4 buckets, with store_sales split into 7 partitions and inventory into 261 partitions. 78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: admin rights for Hive jira
Thanks. I will only use it to correct my typo's. I tend to make many of them. :) On Thu, May 16, 2013 at 11:56 AM, Owen O'Malley omal...@apache.org wrote: On Thu, May 16, 2013 at 7:58 AM, Edward Capriolo edlinuxg...@gmail.com wrote: While you guys are on the topic, I am on the pmc and I do not think I can edit my own comments. With the Hadoop jira security model, which is what Hive's jira uses, only admins can edit comments. Now that Carl granted you admin rights, you can edit comments. Use it judiciously, especially if others have responded to your comment since it can make the conversation difficult to follow. -- Owen
[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests
[ https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4553: -- Attachment: HIVE-4160.patch Column Column, and Column Scalar vectorized execution tests --- Key: HIVE-4553 URL: https://issues.apache.org/jira/browse/HIVE-4553 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4553.patch review board review: https://reviews.apache.org/r/11133/ This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests
[ https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4553: -- Attachment: (was: HIVE-4160.patch) Column Column, and Column Scalar vectorized execution tests --- Key: HIVE-4553 URL: https://issues.apache.org/jira/browse/HIVE-4553 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4553.patch review board review: https://reviews.apache.org/r/11133/ This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Column Column, and Column Scalar vectorized execution tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11133/ --- (Updated May 16, 2013, 6:47 p.m.) Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and Remus Rusanu. Changes --- updated based on review feedback. Description --- This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. This addresses bug HIVE-4553. https://issues.apache.org/jira/browse/HIVE-4553 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java 53d9a7a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java 8a07567 Diff: https://reviews.apache.org/r/11133/diff/ Testing --- generated tests, and ran them. Thanks, tony murphy
[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests
[ https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4553: -- Attachment: HIVE-4553 (2).patch Column Column, and Column Scalar vectorized execution tests --- Key: HIVE-4553 URL: https://issues.apache.org/jira/browse/HIVE-4553 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4553 (2).patch, HIVE-4553.patch review board review: https://reviews.apache.org/r/11133/ This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4467) HiveConnection does not handle failures correctly
[ https://issues.apache.org/jira/browse/HIVE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659904#comment-13659904 ] Thiruvel Thirumoolan commented on HIVE-4467: [~cwsteinbach] Does the updated patch look good? HiveConnection does not handle failures correctly - Key: HIVE-4467 URL: https://issues.apache.org/jira/browse/HIVE-4467 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0, 0.12.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Attachments: HIVE-4467_1.patch, HIVE-4467.patch HiveConnection uses Utils.verifySuccess* routines to check if there is any error from the server side. This is not handled well. In Utils.verifySuccess() when withInfo is 'false', the condition evaluates to 'false' and no SQLexception is thrown even though there could be a problem on the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4467) HiveConnection does not handle failures correctly
[ https://issues.apache.org/jira/browse/HIVE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659916#comment-13659916 ] Carl Steinbach commented on HIVE-4467: -- Changes look good to me. +1. HiveConnection does not handle failures correctly - Key: HIVE-4467 URL: https://issues.apache.org/jira/browse/HIVE-4467 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0, 0.12.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Attachments: HIVE-4467_1.patch, HIVE-4467.patch HiveConnection uses Utils.verifySuccess* routines to check if there is any error from the server side. This is not handled well. In Utils.verifySuccess() when withInfo is 'false', the condition evaluates to 'false' and no SQLexception is thrown even though there could be a problem on the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999
[ https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659934#comment-13659934 ] Hudson commented on HIVE-4440: -- Integrated in Hive-trunk-hadoop2 #199 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/199/]) HIVE-4440 SMB Operator spills to disk like it's 1999 (Gunther Hagleitner via omalley) (Revision 1483084) Result = FAILURE omalley : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483084 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java SMB Operator spills to disk like it's 1999 -- Key: HIVE-4440 URL: https://issues.apache.org/jira/browse/HIVE-4440 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.12.0 Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch I was recently looking into some performance issue with a query that used SMB join and was running really slow. Turns out that the SMB join by default caches only 100 values per key before spilling to disk. That seems overly conservative to me. Changing the parameter resulted in a ~5x speedup - quite significant. The parameter is: hive.mapjoin.bucket.cache.size Which right now is only used the SMB Operator as far as I can tell. The parameter was introduced originally (3 yrs ago) for the map join operator (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in a different context though where you had to avoid running out of memory with the cached hash table in the same process, I think. Two things I'd like to propose: a) Rename it to what it does: hive.smbjoin.cache.rows b) Set it to something less restrictive: 1 If you string together a 5 table smb join with a map join and a map-side group by aggregation you might still run out of memory, but the renamed parameter should be easier to find and reduce. For most queries, I would think that 1 is still a reasonable number to cache (On the reduce side we use 25000 for shuffle joins). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions
[ https://issues.apache.org/jira/browse/HIVE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659935#comment-13659935 ] Hudson commented on HIVE-4550: -- Integrated in Hive-trunk-hadoop2 #199 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/199/]) HIVE-4550 local_mapred_error_cache fails on some hadoop versions (Gunther Hagleitner via omalley) (Revision 1483124) Result = FAILURE omalley : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483124 Files : * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java * /hive/trunk/ql/src/test/results/clientnegative/local_mapred_error_cache.q.out local_mapred_error_cache fails on some hadoop versions -- Key: HIVE-4550 URL: https://issues.apache.org/jira/browse/HIVE-4550 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4550.1.patch I've tested it manually on the upcoming 1.3 version (branch 1). We do mask job_* ids, but not job_local* ids. The fix is to extend this to both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4543) Broken link in HCat 0.5 doc (Reader and Writer Interfaces)
[ https://issues.apache.org/jira/browse/HIVE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-4543: - Status: Open (was: Patch Available) Lefty, the change looks good. But could you please attach the changed .xml file rather than the .html and .pdf files. Then I can rebuild the docs from that and check in the changes. Broken link in HCat 0.5 doc (Reader and Writer Interfaces) -- Key: HIVE-4543 URL: https://issues.apache.org/jira/browse/HIVE-4543 Project: Hive Issue Type: Bug Components: Documentation Reporter: Lefty Leverenz Assignee: Lefty Leverenz Priority: Minor Attachments: HIVE-4543.1.patch, HIVE-4543.2.patch, readerwriter.html, readerwriter.pdf Due to HCatalog's move from the incubator to Hive, a link to TestReaderWriter.java is broken at the end of the Reader and Writer Interfaces doc for HCat 0.5 ([here|http://hive.apache.org/docs/hcat_r0.5.0/readerwriter.html#Complete+Example+Program]). This should be fixed in the html and pdf files. Thanks to Himanshu Bari for pointing this out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #147
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/ -- [...truncated 62234 lines...] [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2013-05-16 13:50:50,187 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] Execution completed successfully [junit] Mapred Local Task Succeeded . Convert the Join into MapJoin [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/localscratchdir/hive_2013-05-16_13-50-46_848_6460139522905130875/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/tmp/hive_job_log_jenkins_201305161350_1297667893.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Copying file: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt [junit] PREHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] Table default.testhivedrivertable stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0] [junit] POSTHOOK: query: load data local inpath 'https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/localscratchdir/hive_2013-05-16_13-50-51_771_4365363466253624657/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/localscratchdir/hive_2013-05-16_13-50-51_771_4365363466253624657/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/tmp/hive_job_log_jenkins_201305161350_2061596878.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable
[ANNOUNCE] Apache Hive 0.11.0 Released
The Apache Hive team is proud to announce the the release of Apache Hive version 0.11.0. The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop, it provides: * Tools to enable easy data extract/transform/load (ETL) * A mechanism to impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS or in other data storage systems such as Apache HBase * Query execution via MapReduce For Hive release details and downloads, please visit: http://hive.apache.org/releases.html Hive 0.11.0 Release Notes are available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323587styleName=HtmlprojectId=12310843 We would like to thank the many contributors who made this release possible. Regards, The Apache Hive Team
Re: [ANNOUNCE] Apache Hive 0.11.0 Released
Congratulations! On Thu, May 16, 2013 at 4:19 PM, Owen O'Malley omal...@apache.org wrote: The Apache Hive team is proud to announce the the release of Apache Hive version 0.11.0. The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop, it provides: * Tools to enable easy data extract/transform/load (ETL) * A mechanism to impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS or in other data storage systems such as Apache HBase * Query execution via MapReduce For Hive release details and downloads, please visit: http://hive.apache.org/releases.html Hive 0.11.0 Release Notes are available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323587styleName=HtmlprojectId=12310843 We would like to thank the many contributors who made this release possible. Regards, The Apache Hive Team -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com
[jira] [Updated] (HIVE-4572) ColumnPruner cannot preserve RS key columns corresponding to un-selected join keys in columnExprMap
[ https://issues.apache.org/jira/browse/HIVE-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4572: --- Attachment: HIVE-4572.1.patch.txt Add a patch. In org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory.pruneReduceSinkOperator, RS key columns will not be removed columnExprMap. Need to test if this change works for HIVE-2206. ColumnPruner cannot preserve RS key columns corresponding to un-selected join keys in columnExprMap --- Key: HIVE-4572 URL: https://issues.apache.org/jira/browse/HIVE-4572 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4572.1.patch.txt, HIVE-4572.replay.patch For a RS of a join operator, if the join key corresponding to this RS does not appear in the SELECT clause, ColumnPruner will drop the entry of this column in colExprMap. Example: {code} SELECT x.key FROM src1 x JOIN src y ON (x.key = y.key); {\code} Before CP, {code} colExprMap of RS corresponding to x: {VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], VALUE._col0=Column[key]}; colExprMap of RS corresponding to y: {VALUE._col3=Column[INPUT__FILE__NAME], VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], VALUE._col0=Column[key]}. {\code} After CP, {code} colExprMap of RS corresponding to x: {VALUE._col0=Column[key]}; colExprMap of RS corresponding to y: {}. {\code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Moving HCatalog docs to wiki
Hi Lefty, Does my outline for the Hive wiki fit the bill? Or should HCatalog docs be parceled out among existing Hive topics? (Installation in Admin docs, and so on.) If necessary, an HCatalog overview page could show where to find everything. My personal preference is for parceling the HCatalog docs out among the existing Hive topics along with an HCatalog overview page. Thanks. Carl
[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests
[ https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4553: -- Attachment: HIVE-4553 (3).patch Column Column, and Column Scalar vectorized execution tests --- Key: HIVE-4553 URL: https://issues.apache.org/jira/browse/HIVE-4553 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4553 (2).patch, HIVE-4553 (3).patch, HIVE-4553.patch review board review: https://reviews.apache.org/r/11133/ This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is back to normal : Hive-0.9.1-SNAPSHOT-h0.21 #374
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/374/
Re: Review Request: Column Column, and Column Scalar vectorized execution tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11133/#review20671 --- ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt https://reviews.apache.org/r/11133/#comment42698 either add a noNulls check or put a comment that you have filled the isNull array with false if there are no nulls so you know this is safe. Please do this in other templates too. In general it is not safe to look at isNull entries if noNulls is true. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt https://reviews.apache.org/r/11133/#comment42685 Still some style issues. Please run ant checkstyle on the output of the template generation and update the templates to get rid of the issues. E.g. }while - Eric Hanson On May 16, 2013, 6:47 p.m., tony murphy wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11133/ --- (Updated May 16, 2013, 6:47 p.m.) Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and Remus Rusanu. Description --- This patch adds Column Column, and Column Scalar vectorized execution tests. These tests are generated in parallel with the vectorized expressions. The tests focus is on validating the column vector and the vectorized row batch metadata regarding nulls, repeating, and selection. Overview of Changes: CodeGen.java: + joinPath, getCamelCaseType, readFile and writeFile made static for use in TestCodeGen.java. + filter types now specify null as their output type rather than doesn't matter to make detection for test generation easier. + support for test generation added. TestCodeGen.java Templates: TestClass.txt TestColumnColumnFilterVectorExpressionEvaluation.txt, TestColumnColumnOperationVectorExpressionEvaluation.txt, TestColumnScalarFilterVectorExpressionEvaluation.txt, TestColumnScalarOperationVectorExpressionEvaluation.txt +This class is mutable and maintains a hashmap of TestSuiteClassName to test cases. The tests cases are added over the course of vectorized expressions class generation, with test classes being outputted at the end. For each column vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used to generate test cases across nulls and repeating dimensions. Based on the input column vector(s) nulls and repeating states the states of the output column vector (if there is one) is validated, along with the null vector. For filter operations the selection vector is validated against the generated data. Each template corresponds to a class representing a test suite. VectorizedRowGroupUtil.java +added methods generateLongColumnVector and generateDoubleColumnVector for generating the respective column vectors with optional nulls and/or repeating values. This addresses bug HIVE-4553. https://issues.apache.org/jira/browse/HIVE-4553 Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java 53d9a7a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java 8a07567 Diff: https://reviews.apache.org/r/11133/diff/
[jira] [Updated] (HIVE-4568) Beeline needs to support resolving variables
[ https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-4568: -- Attachment: HIVE-4568.patch Beeline needs to support resolving variables Key: HIVE-4568 URL: https://issues.apache.org/jira/browse/HIVE-4568 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Attachments: HIVE-4568.patch Beeline currently doesn't support variable (system, env, etc) substitution as hive client does. Supporting this feature will certainly make it more usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4568) Beeline needs to support resolving variables
[ https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-4568: -- Fix Version/s: 0.11.1 Status: Patch Available (was: Open) Attached patch is to address the issue. Beeline needs to support resolving variables Key: HIVE-4568 URL: https://issues.apache.org/jira/browse/HIVE-4568 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.11.1 Attachments: HIVE-4568.patch Beeline currently doesn't support variable (system, env, etc) substitution as hive client does. Supporting this feature will certainly make it more usable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira