date:20130516

[jira] [Created] (HIVE-4569) GetQueryPlan api in Hive Server2

2013-05-16 Thread Amareshwari Sriramadasu (JIRA)

Amareshwari Sriramadasu created HIVE-4569:
-

 Summary: GetQueryPlan api in Hive Server2
 Key: HIVE-4569
 URL: https://issues.apache.org/jira/browse/HIVE-4569
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu


It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api 
available in HiveServer2, though the wiki 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API 
contains, not sure why it was not added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing

2013-05-16 Thread Amareshwari Sriramadasu (JIRA)

Amareshwari Sriramadasu created HIVE-4570:
-

 Summary: More information to user on GetOperationStatus in Hive 
Server2 when query is still executing
 Key: HIVE-4570
 URL: https://issues.apache.org/jira/browse/HIVE-4570
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu


Currently in Hive Server2, when the query is still executing only the status is 
set as STILL_EXECUTING. 

This issue is to give more information to the user such as progress and running 
job handles, if possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: admin rights for Hive jira

2013-05-16 Thread Carl Steinbach

Done. I also made sure that all of the other Hive committers have admin
privileges.

Thanks.

Carl


On Wed, May 15, 2013 at 8:59 PM, Owen O'Malley omal...@apache.org wrote:

 Carl,
Please give me admin rights in Hive's jira so that I can close the
 0.11.0 release and create the 0.11.1 target as described in the
 HowToRelease wiki page (https://cwiki.apache.org/Hive/howtorelease.html.
 I'm a jira admin on 11 other Apache projects and I believe I created the
 Hive jira originally. *smile*

 Thanks,
Owen

[jira] [Commented] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions

2013-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659363#comment-13659363
 ] 

Hudson commented on HIVE-4550:
--

Integrated in Hive-trunk-h0.21 #2105 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2105/])
HIVE-4550 local_mapred_error_cache fails on some hadoop versions (Gunther 
Hagleitner via omalley) (Revision 1483124)

 Result = FAILURE
omalley : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483124
Files : 
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
* /hive/trunk/ql/src/test/results/clientnegative/local_mapred_error_cache.q.out


 local_mapred_error_cache fails on some hadoop versions
 --

 Key: HIVE-4550
 URL: https://issues.apache.org/jira/browse/HIVE-4550
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4550.1.patch


 I've tested it manually on the upcoming 1.3 version (branch 1).
 We do mask job_* ids, but not job_local* ids. The fix is to extend this to 
 both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999

2013-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659362#comment-13659362
 ] 

Hudson commented on HIVE-4440:
--

Integrated in Hive-trunk-h0.21 #2105 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2105/])
HIVE-4440 SMB Operator spills to disk like it's 1999 (Gunther Hagleitner via
omalley) (Revision 1483084)

 Result = FAILURE
omalley : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483084
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java


 SMB Operator spills to disk like it's 1999
 --

 Key: HIVE-4440
 URL: https://issues.apache.org/jira/browse/HIVE-4440
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.12.0

 Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch


 I was recently looking into some performance issue with a query that used SMB 
 join and was running really slow. Turns out that the SMB join by default 
 caches only 100 values per key before spilling to disk. That seems overly 
 conservative to me. Changing the parameter resulted in a ~5x speedup - quite 
 significant.
 The parameter is: hive.mapjoin.bucket.cache.size
 Which right now is only used the SMB Operator as far as I can tell.
 The parameter was introduced originally (3 yrs ago) for the map join operator 
 (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in 
 a different context though where you had to avoid running out of memory with 
 the cached hash table in the same process, I think.
 Two things I'd like to propose:
 a) Rename it to what it does: hive.smbjoin.cache.rows
 b) Set it to something less restrictive: 1
 If you string together a 5 table smb join with a map join and a map-side 
 group by aggregation you might still run out of memory, but the renamed 
 parameter should be easier to find and reduce. For most queries, I would 
 think that 1 is still a reasonable number to cache (On the reduce side we 
 use 25000 for shuffle joins).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-h0.21 - Build # 2105 - Still Failing

2013-05-16 Thread Apache Jenkins Server

Changes for Build #2078
[namit] HIVE-4409 Prevent incompatible column type changes
(Dilip Joseph via namit)

[namit] HIVE-4095 Add exchange partition in Hive
(Dheeraj Kumar Singh via namit)

[namit] HIVE-4005 Column truncation
(Kevin Wilfong via namit)

[namit] HIVE-3952 merge map-job followed by map-reduce job
(Vinod Kumar Vavilapalli via namit)

[hashutosh] HIVE-4412 : PTFDesc tries serialize transient fields like OIs, etc. 
(Navis via Ashutosh Chauhan)

[khorgath] HIVE-4419 : webhcat - support ${WEBHCAT_PREFIX}/conf/ as config 
directory (Thejas M Nair via Sushanth Sowmyan)

[namit] HIVE-4181 Star argument without table alias for UDTF is not working
(Navis via namit)

[hashutosh] HIVE-4407 : TestHCatStorer.testStoreFuncAllSimpleTypes fails 
because of null case difference (Thejas Nair via Ashutosh Chauhan)

[hashutosh] HIVE-4369 : Many new failures on hadoop 2 (Vikram Dixit via 
Ashutosh Chauhan)


Changes for Build #2079
[namit] HIVE-4424 MetaStoreUtils.java.orig checked in mistakenly by HIVE-4409
(Namit Jain)

[hashutosh] HIVE-4358 : Check for Map side processing in PTFOp is no longer 
valid (Harish Butani via Ashutosh Chauhan)


Changes for Build #2080
[navis] HIVE-4068 Size of aggregation buffer which uses non-primitive type is 
not estimated correctly (Navis)

[khorgath] HIVE-4420 : HCatalog unit tests stop after a failure (Alan Gates via 
Sushanth Sowmyan)

[hashutosh] HIVE-3708 : Add mapreduce workflow information to job configuration 
(Billie Rinaldi via Ashutosh Chauhan)


Changes for Build #2081

Changes for Build #2082
[hashutosh] HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh 
Chauhan)

[hashutosh] HIVE-4398 : HS2 Resource leak: operation handles not cleaned when 
originating session is closed (Ashish Vaidya via Ashutosh Chauhan)

[hashutosh] HIVE-4019 : Ability to create and drop temporary partition function 
(Brock Noland via Ashutosh Chauhan)


Changes for Build #2083
[navis] HIVE-4437 Missing file on HIVE-4068 (Navis)


Changes for Build #2084

Changes for Build #2085

Changes for Build #2086
[hashutosh] HIVE-4350 : support AS keyword for table alias (Matthew Weaver via 
Ashutosh Chauhan)

[hashutosh] HIVE-4439 : Remove unused join configuration parameter: 
hive.mapjoin.cache.numrows (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4438 : Remove unused join configuration parameter: 
hive.mapjoin.size.key (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-3682 : when output hive table to file,users should could have 
a separator of their own choice (Sushanth Sowmyan via Ashutosh Chauhan)

[hashutosh] HIVE-4373 : Hive Version returned by 
HiveDatabaseMetaData.getDatabaseProductVersion is incorrect (Thejas Nair via 
Ashutosh Chauhan)


Changes for Build #2087

Changes for Build #2088
[gates] HIVE-4465 webhcat e2e tests succeed regardless of exitvalue


Changes for Build #2089
[cws] HIVE-3957. Add pseudo-BNF grammar for RCFile to Javadoc (Mark Grover via 
cws)

[cws] HIVE-4497. beeline module tests don't get run by default (Thejas Nair via 
cws)

[gangtimliu] HIVE-4474: Column access not tracked properly for partitioned 
tables. Samuel Yuan via Gang Tim Liu

[hashutosh] HIVE-4455 : HCatalog build directories get included in tar file 
produced by ant tar (Alan Gates via Ashutosh Chauhan)


Changes for Build #2090

Changes for Build #2091
[hashutosh] HIVE-4392 : Illogical InvalidObjectException throwed when use mulit 
aggregate functions with star columns  (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-4421 : Improve memory usage by ORC dictionaries (Owen Omalley 
via Ashutosh Chauhan)

[mithun] HCATALOG-627 - Adding thread-safety to NotificationListener. (amalakar 
via mithun)


Changes for Build #2092
[hashutosh] HIVE-4466 : Fix continue.on.failure in unit tests to -well- 
continue on failure in unit tests (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4471 : Build fails with hcatalog checkstyle error (Gunther 
Hagleitner via Ashutosh Chauhan)


Changes for Build #2093
[omalley] HIVE-4494 ORC map columns get class cast exception in some contexts 
(omalley)

[omalley] HIVE-4500 Ensure that HiveServer 2 closes log files. (Alan Gates via 
omalley)


Changes for Build #2094
[navis] HIVE-4209 Cache evaluation result of deterministic expression and reuse 
it (Navis via namit)


Changes for Build #2095

Changes for Build #2096

Changes for Build #2097
[cws] HIVE-4530. Enforce minmum ant version required in build script (Arup 
Malakar via cws)

[omalley] Preparing RELEASE_NOTES for Hive 0.11.0rc2.


Changes for Build #2098
[omalley] Update release notes for 0.11.0rc2

[omalley] HIVE-4527 Fix eclipse project template (Carl Steinbach via omalley)

[omalley] HIVE-4505 Hive can't load transforms with remote scripts. (Prasad 
Majumdar and Gunther Hagleitner
via omalley)

[omalley] HIVE-4498 TestBeeLineWithArgs.testPositiveScriptFile fails (Thejas 
Nair via omalley)


Changes for Build #2099

Changes for Build #2100

Changes for Build #2101

Changes for

[jira] [Commented] (HIVE-4568) Beeline needs to support resolving variables

2013-05-16 Thread caofangkun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659381#comment-13659381
 ] 

caofangkun commented on HIVE-4568:
--

env:* and and system:* variables can not be set .
And other variables can be set in beeline Already.


 Beeline needs to support resolving variables
 

 Key: HIVE-4568
 URL: https://issues.apache.org/jira/browse/HIVE-4568
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Priority: Minor

 Beeline currently doesn't support variable (system, env, etc) substitution as 
 hive client does. Supporting this feature will certainly make it more usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4552) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly

2013-05-16 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4552:
---

   Resolution: Fixed
Fix Version/s: vectorization-branch
   Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Sarvesh!

 Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating 
 correctly
 ---

 Key: HIVE-4552
 URL: https://issues.apache.org/jira/browse/HIVE-4552
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Fix For: vectorization-branch

 Attachments: Hive.4552.0.patch


 IsRepeating flag in ColumnVector is being set incorrectly by ORC 
 RecordReader(RecordReaderImpl.java) and as such wrong results are being 
 written by VectorFileSinkOperator. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4571) Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe

2013-05-16 Thread Harsh J (JIRA)

Harsh J created HIVE-4571:
-

 Summary: Reinvestigate HIVE-337 induced limit on number of 
separator characters in LazySerDe
 Key: HIVE-4571
 URL: https://issues.apache.org/jira/browse/HIVE-4571
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: Harsh J
Priority: Minor


HIVE-337 added support for complex data structures and also oddly added in a 
limit of the # of separator characters required to make that happen.

When using an Avro-based table that has more than 8-10 levels of nesting in 
records, this limit gets hit and such tables can't be queried.

We either need to remove such a limit or raise it to a high-enough value to 
support such nested data structures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4571) Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe

2013-05-16 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659431#comment-13659431
 ] 

Harsh J commented on HIVE-4571:
---

A sample change would be:

{code}
diff --git 
a/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
b/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java
index 0036a8e..252ea6b 100644
--- a/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java
+++ b/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java
@@ -211,7 +211,7 @@ public class LazySimpleSerDe implements SerDe {
 // Read the separators: We use 8 levels of separators by default, but we
 // should change this when we allow users to specify more than 10 levels
 // of separators through DDL.
-serdeParams.separators = new byte[8];
+serdeParams.separators = new byte[32];
 serdeParams.separators[0] = getByte(tbl.getProperty(Constants.FIELD_DELIM,
 tbl.getProperty(Constants.SERIALIZATION_FORMAT)), 
DefaultSeparators[0]);
 serdeParams.separators[1] = getByte(tbl
{code}

 Reinvestigate HIVE-337 induced limit on number of separator characters in 
 LazySerDe
 ---

 Key: HIVE-4571
 URL: https://issues.apache.org/jira/browse/HIVE-4571
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: Harsh J
Priority: Minor

 HIVE-337 added support for complex data structures and also oddly added in a 
 limit of the # of separator characters required to make that happen.
 When using an Avro-based table that has more than 8-10 levels of nesting in 
 records, this limit gets hit and such tables can't be queried.
 We either need to remove such a limit or raise it to a high-enough value to 
 support such nested data structures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: admin rights for Hive jira

2013-05-16 Thread Xuefu Zhang

Hi Carl,

As a side question, though not a committer, I expect to be able to assign a
JIRA to myself, especially for those created by me, so as to let other
people know that the JIRA is being worked on. Is this reasonable?

I can see other projects give this privilege to non-committers. Could you
please comment?

Thanks,
Xuefu


On Thu, May 16, 2013 at 12:51 AM, Carl Steinbach cwsteinb...@gmail.comwrote:

 Done. I also made sure that all of the other Hive committers have admin
 privileges.

 Thanks.

 Carl


 On Wed, May 15, 2013 at 8:59 PM, Owen O'Malley omal...@apache.org wrote:

  Carl,
 Please give me admin rights in Hive's jira so that I can close the
  0.11.0 release and create the 0.11.1 target as described in the
  HowToRelease wiki page (https://cwiki.apache.org/Hive/howtorelease.html.
  I'm a jira admin on 11 other Apache projects and I believe I created the
  Hive jira originally. *smile*
 
  Thanks,
 Owen

Re: admin rights for Hive jira

2013-05-16 Thread Edward Capriolo

While you guys are on the topic, I am on the pmc and I do not think I can
edit my own comments.


On Thu, May 16, 2013 at 9:50 AM, Xuefu Zhang xzh...@cloudera.com wrote:

 Hi Carl,

 As a side question, though not a committer, I expect to be able to assign a
 JIRA to myself, especially for those created by me, so as to let other
 people know that the JIRA is being worked on. Is this reasonable?

 I can see other projects give this privilege to non-committers. Could you
 please comment?

 Thanks,
 Xuefu


 On Thu, May 16, 2013 at 12:51 AM, Carl Steinbach cwsteinb...@gmail.com
 wrote:

  Done. I also made sure that all of the other Hive committers have admin
  privileges.
 
  Thanks.
 
  Carl
 
 
  On Wed, May 15, 2013 at 8:59 PM, Owen O'Malley omal...@apache.org
 wrote:
 
   Carl,
  Please give me admin rights in Hive's jira so that I can close the
   0.11.0 release and create the 0.11.1 target as described in the
   HowToRelease wiki page (
 https://cwiki.apache.org/Hive/howtorelease.html.
   I'm a jira admin on 11 other Apache projects and I believe I created
 the
   Hive jira originally. *smile*
  
   Thanks,
  Owen

[jira] [Updated] (HIVE-4554) Failed to create a table from existing file if file path has spaces

2013-05-16 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4554:


Fix Version/s: (was: 0.11.0)
   0.11.1

 Failed to create a table from existing file if file path has spaces
 ---

 Key: HIVE-4554
 URL: https://issues.apache.org/jira/browse/HIVE-4554
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
 Fix For: 0.11.1

 Attachments: HIVE-4554.patch, HIVE-4554.patch.1


 To reproduce the problem,
 1. Create a table, say, person_age (name STRING, age INT).
 2. Create a file whose name has a space in it, say, data set.txt.
 3. Try to load the date in the file to the table.
 The following error can be seen in the console:
 hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age;
 Loading data to table default.person_age
 Failed with exception Wrong file format. Please check the file's format.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 Note: the error message is confusing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4565) TestCliDriver and TestParse fail with non Sun Java

2013-05-16 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4565:


Fix Version/s: (was: 0.11.0)
   0.11.1

 TestCliDriver and TestParse fail with non Sun Java
 --

 Key: HIVE-4565
 URL: https://issues.apache.org/jira/browse/HIVE-4565
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.11.0
 Environment: RedHat x86 IBM Java 6
Reporter: Renata Ghisloti Duarte de Souza
Priority: Minor
 Fix For: 0.11.1

 Attachments: HIVE-4565.patch


 While executing Hive's unit tests two testcases have different outputs with 
 Sun Java and non-Sun Java (such as IBM):
 TestCliDriver and TestParse.
 The differences are mainly due to the use of HashMaps on the creation of the 
 Logical Plan on analyzeInternal method. Sun java presents the elements of a 
 HashMap in one order, and non sun Java on a different order.
 Both outputs are correct, and don't affect the final query result.  I propose 
 this patch attached to make Hive unit tests compliant with all JVMs.
 The patch adds the output files and a change on ql/build.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: admin rights for Hive jira

2013-05-16 Thread Owen O'Malley

On Thu, May 16, 2013 at 6:50 AM, Xuefu Zhang xzh...@cloudera.com wrote:

 Hi Carl,

 As a side question, though not a committer, I expect to be able to assign a
 JIRA to myself, especially for those created by me, so as to let other
 people know that the JIRA is being worked on. Is this reasonable?


I've added you to the contributors list. You should be able to assign jiras
to yourself.

-- Owen



 I can see other projects give this privilege to non-committers. Could you
 please comment?

 Thanks,
 Xuefu


 On Thu, May 16, 2013 at 12:51 AM, Carl Steinbach cwsteinb...@gmail.com
 wrote:

  Done. I also made sure that all of the other Hive committers have admin
  privileges.
 
  Thanks.
 
  Carl
 
 
  On Wed, May 15, 2013 at 8:59 PM, Owen O'Malley omal...@apache.org
 wrote:
 
   Carl,
  Please give me admin rights in Hive's jira so that I can close the
   0.11.0 release and create the 0.11.1 target as described in the
   HowToRelease wiki page (
 https://cwiki.apache.org/Hive/howtorelease.html.
   I'm a jira admin on 11 other Apache projects and I believe I created
 the
   Hive jira originally. *smile*
  
   Thanks,
  Owen

Re: admin rights for Hive jira

2013-05-16 Thread Owen O'Malley

On Thu, May 16, 2013 at 7:58 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 While you guys are on the topic, I am on the pmc and I do not think I can
 edit my own comments.


With the Hadoop jira security model, which is what Hive's jira uses, only
admins can edit comments. Now that Carl granted you admin rights, you can
edit comments. Use it judiciously, especially if others have responded to
your comment since it can make the conversation difficult to follow.

-- Owen

[jira] [Assigned] (HIVE-4566) NullPointerException if typeinfo and nativesql commands are executed at beeline before a DB connection is established

2013-05-16 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-4566:
-

Assignee: Xuefu Zhang

 NullPointerException if typeinfo and nativesql commands are executed at 
 beeline before a DB connection is established
 -

 Key: HIVE-4566
 URL: https://issues.apache.org/jira/browse/HIVE-4566
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang

 Before a DB connection is established, executing a command such as typeinfo 
 and nativesql results an NPE shown at the console:
 beeline !typeinfo
 java.lang.NullPointerException
 beeline !nativesql
 java.lang.NullPointerException
 Instead, a message, such as No current connection should be given, as in 
 case of some other commands, such as dropall.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4568) Beeline needs to support resolving variables

2013-05-16 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-4568:
-

Assignee: Xuefu Zhang

 Beeline needs to support resolving variables
 

 Key: HIVE-4568
 URL: https://issues.apache.org/jira/browse/HIVE-4568
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor

 Beeline currently doesn't support variable (system, env, etc) substitution as 
 hive client does. Supporting this feature will certainly make it more usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4554) Failed to create a table from existing file if file path has spaces

2013-05-16 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-4554:
-

Assignee: Xuefu Zhang

 Failed to create a table from existing file if file path has spaces
 ---

 Key: HIVE-4554
 URL: https://issues.apache.org/jira/browse/HIVE-4554
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.11.1

 Attachments: HIVE-4554.patch, HIVE-4554.patch.1


 To reproduce the problem,
 1. Create a table, say, person_age (name STRING, age INT).
 2. Create a file whose name has a space in it, say, data set.txt.
 3. Try to load the date in the file to the table.
 The following error can be seen in the console:
 hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age;
 Loading data to table default.person_age
 Failed with exception Wrong file format. Please check the file's format.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 Note: the error message is confusing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: Column Column, and Column Scalar vectorized execution tests

2013-05-16 Thread tony murphy



 On May 14, 2013, 6:13 p.m., Eric Hanson wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt,
   line 1
  https://reviews.apache.org/r/11133/diff/2/?file=291141#file291141line1
 
  test .txt templates need apache license

the generated class will have the apache license (see testclass.txt) but if I 
add it here then every test case would have it, which would bloat the class 
with tons of comments. is it necessary? or is there another way to specify it? 
like maybe a license.txt in the directory?


- tony


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11133/#review20538
---


On May 14, 2013, 12:34 a.m., tony murphy wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11133/
 ---
 
 (Updated May 14, 2013, 12:34 a.m.)
 
 
 Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, 
 and Remus Rusanu.
 
 
 Description
 ---
 
 This patch adds Column Column, and Column Scalar vectorized execution tests. 
 These tests are generated in parallel with the vectorized expressions. The 
 tests focus is on validating the column vector and the vectorized row batch 
 metadata regarding nulls, repeating, and selection.
 
 Overview of Changes:
 
 CodeGen.java:
 + joinPath, getCamelCaseType, readFile and writeFile made static for use in 
 TestCodeGen.java.
 + filter types now specify null as their output type rather than doesn't 
 matter to make detection for test generation easier.
 + support for test generation added.
 
 TestCodeGen.java  Templates: 
  TestClass.txt
  TestColumnColumnFilterVectorExpressionEvaluation.txt,
  TestColumnColumnOperationVectorExpressionEvaluation.txt,
  TestColumnScalarFilterVectorExpressionEvaluation.txt,
  TestColumnScalarOperationVectorExpressionEvaluation.txt
 +This class is mutable and maintains a hashmap of TestSuiteClassName to test 
 cases. The tests cases are added over the course of vectorized expressions 
 class generation, with test classes being outputted at the end. For each 
 column vector (inputs and/or outputs) a matrix of pairwise covering Booleans 
 is used to generate test cases across nulls and repeating dimensions. Based 
 on the input column vector(s) nulls and repeating states the states of the 
 output column vector (if there is one) is validated, along with the null 
 vector. For filter operations the selection vector is validated against the 
 generated data. Each template corresponds to a class representing a test 
 suite.
 
 VectorizedRowGroupUtil.java
 +added methods generateLongColumnVector and generateDoubleColumnVector for 
 generating the respective column vectors with optional nulls and/or repeating 
 values.
 
 
 This addresses bug HIVE-4553.
 https://issues.apache.org/jira/browse/HIVE-4553
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
  53d9a7a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java
  8a07567 
 
 Diff: https://reviews.apache.org/r/11133/diff/
 
 
 Testing
 ---
 
 generated tests, and ran them.
 
 
 Thanks,
 
 tony murphy

[jira] [Commented] (HIVE-4472) OR, NOT Filter logic can lose an array, and always takes time O(VectorizedRowBatch.DEFAULT_SIZE)

2013-05-16 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659757#comment-13659757
 ] 

Eric Hanson commented on HIVE-4472:
---

I posted additional comments to review board. Patch is almost ready but not 
quite. Expect one more update from Jitendra.

 OR, NOT Filter logic can lose an array, and always takes time 
 O(VectorizedRowBatch.DEFAULT_SIZE)
 

 Key: HIVE-4472
 URL: https://issues.apache.org/jira/browse/HIVE-4472
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4472.1.patch, HIVE-4472.2.patch, HIVE-4472.3.patch


 The issue is in file FilterExprOrExpr.java and FilterNotExpr.java.
 I posted a review for you at 
 https://reviews.apache.org/r/10752/
 I think there is a bug related to sharing of an array of integers. Also, one 
 algorithm step takes O(DEFAULT_BATCH_SIZE) time always. If 
 nDEFAULT_BATCH_SIZE then this is a performance issue. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions

2013-05-16 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659762#comment-13659762
 ] 

Owen O'Malley commented on HIVE-4486:
-

+1 can you run the unit tests?

 FetchOperator slows down SMB map joins by 50% when there are many partitions
 

 Key: HIVE-4486
 URL: https://issues.apache.org/jira/browse/HIVE-4486
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
 Environment: Ubuntu LXC 12.10
Reporter: Gopal V
Priority: Minor
 Attachments: HIVE-4486.patch, smb-profile.html


 While looking at log files for SMB joins in hive, it was noticed that the 
 actual join op didn't show up as a significant fraction of the time spent. 
 Most of the time was spent parsing configuration files.
 To confirm, I put log lines in the HiveConf constructor and eventually made 
 the following edit to the code
 {code}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
 HiveException {
 * @return list of file status entries
 */
private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
 IOException {
 -HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
 -boolean recursive = 
 hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
 +boolean recursive = false;
  if (!recursive) {
return fs.listStatus(p);
  }
 {code}
 And re-ran my query to compare timings.
 || ||Before||After||
 |Cumulative CPU| 731.07 sec|386.0 sec|
 |Total time | 347.66 seconds | 218.855 seconds | 
 |
 The query used was 
 {code}INSERT OVERWRITE LOCAL DIRECTORY
 '/grid/0/smb/'
 select inv_item_sk
 from
  inventory inv
  join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk)
 limit 10
 ;
 {code}
 On a scale=2 tpcds data-set, where both store_sales  inventory are bucketed 
 into 4 buckets, with store_sales split into 7 partitions and inventory into 
 261 partitions.
 78% of all CPU time was spent within new HiveConf(). The yourkit profiler 
 runs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4572) ColumnPruner cannot preserve RS key columns in columnExprMap

2013-05-16 Thread Yin Huai (JIRA)

Yin Huai created HIVE-4572:
--

 Summary: ColumnPruner cannot preserve RS key columns in 
columnExprMap
 Key: HIVE-4572
 URL: https://issues.apache.org/jira/browse/HIVE-4572
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai


For a RS of a join operator, if the join key corresponding to this RS does not 
appear in the SELECT clause, ColumnPruner will drop the entry of this column in 
colExprMap. 

Example:
{code}
SELECT x.key FROM src1 x JOIN src y ON (x.key = y.key);
{\code}
Before CP,
{code}
colExprMap of RS corresponding to x: {VALUE._col3=Column[INPUT__FILE__NAME], 
VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], 
VALUE._col0=Column[key]};
colExprMap of RS corresponding to y: {VALUE._col3=Column[INPUT__FILE__NAME], 
VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], 
VALUE._col0=Column[key]}.
{\code}
After CP,
{code}
colExprMap of RS corresponding to x: {VALUE._col0=Column[key]};
colExprMap of RS corresponding to y: {}.
{\code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4572) ColumnPruner cannot preserve RS key columns in columnExprMap

2013-05-16 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4572:
---

Attachment: HIVE-4572.replay.patch

To see the problem, you can apply the patch HIVE-4572.replay.patch, and 
execute 
{code}
ant test -Dtestcase=TestCliDriver -Dqfile=RSKeyLostAfterCP.q
{\code}. 
Then you can search In pruneReduceSinkOperator oldMap and In 
pruneReduceSinkOperator newMap in build/ql/tmp/hive.log to see the problem.

 ColumnPruner cannot preserve RS key columns in columnExprMap
 

 Key: HIVE-4572
 URL: https://issues.apache.org/jira/browse/HIVE-4572
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4572.replay.patch


 For a RS of a join operator, if the join key corresponding to this RS does 
 not appear in the SELECT clause, ColumnPruner will drop the entry of this 
 column in colExprMap. 
 Example:
 {code}
 SELECT x.key FROM src1 x JOIN src y ON (x.key = y.key);
 {\code}
 Before CP,
 {code}
 colExprMap of RS corresponding to x: {VALUE._col3=Column[INPUT__FILE__NAME], 
 VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], 
 VALUE._col0=Column[key]};
 colExprMap of RS corresponding to y: {VALUE._col3=Column[INPUT__FILE__NAME], 
 VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], 
 VALUE._col0=Column[key]}.
 {\code}
 After CP,
 {code}
 colExprMap of RS corresponding to x: {VALUE._col0=Column[key]};
 colExprMap of RS corresponding to y: {}.
 {\code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4572) ColumnPruner cannot preserve RS key columns corresponding to un-selected join keys in columnExprMap

2013-05-16 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4572:
---

Summary: ColumnPruner cannot preserve RS key columns corresponding to 
un-selected join keys in columnExprMap  (was: ColumnPruner cannot preserve RS 
key columns in columnExprMap)

 ColumnPruner cannot preserve RS key columns corresponding to un-selected join 
 keys in columnExprMap
 ---

 Key: HIVE-4572
 URL: https://issues.apache.org/jira/browse/HIVE-4572
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4572.replay.patch


 For a RS of a join operator, if the join key corresponding to this RS does 
 not appear in the SELECT clause, ColumnPruner will drop the entry of this 
 column in colExprMap. 
 Example:
 {code}
 SELECT x.key FROM src1 x JOIN src y ON (x.key = y.key);
 {\code}
 Before CP,
 {code}
 colExprMap of RS corresponding to x: {VALUE._col3=Column[INPUT__FILE__NAME], 
 VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], 
 VALUE._col0=Column[key]};
 colExprMap of RS corresponding to y: {VALUE._col3=Column[INPUT__FILE__NAME], 
 VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], 
 VALUE._col0=Column[key]}.
 {\code}
 After CP,
 {code}
 colExprMap of RS corresponding to x: {VALUE._col0=Column[key]};
 colExprMap of RS corresponding to y: {}.
 {\code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4403) Running Hive queries on Yarn (MR2) gives warnings related to overriding final parameters

2013-05-16 Thread Chu Tong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659782#comment-13659782
 ] 

Chu Tong commented on HIVE-4403:


Can someone please help me to review this code? Thanks

 Running Hive queries on Yarn (MR2) gives warnings related to overriding final 
 parameters
 

 Key: HIVE-4403
 URL: https://issues.apache.org/jira/browse/HIVE-4403
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mark Grover
 Attachments: HIVE-4403.patch


 While working on BIGTOP-885, I saw that Hive was giving a bunch of warnings 
 related to overriding final parameters in job.conf. This was on a pseudo 
 distributed cluster. FWIW, I didn't see this happen on a fully-distributed 
 cluster. Perhaps, Hive's job.conf is overriding some final parameters it 
 shouldn't.
 Here is what the warnings looked like:
 {code}
 2013-04-19 14:20:32,304 WARN  [main] conf.Configuration 
 (Configuration.java:loadProperty(2032)) - 
 file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.retry.interval;  Ignoring.
 2013-04-19 14:20:32,367 WARN  [main] conf.Configuration 
 (Configuration.java:loadProperty(2032)) - 
 file:/tmp/root/hive_2013-04-19_14-20-30_159_5701876916688815815/-local-10002/jobconf.xml:an
  attempt to override final parameter: 
 mapreduce.job.end-notification.max.attempts;  Ignoring.
 {code}
 To reproduce, run a query like:
 {code}
 CREATE TABLE u_data (
   userid INT,
   movieid INT,
   rating INT,
   unixtime STRING)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 STORED AS TEXTFILE;
 {code}
 Load some data into u_data, here is some sample data:
 https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hive/src/main/resources/seed_data_files/ml-data/u.data
 Run a simple query on that data (on YARN/MR2)
 {code}
 INSERT OVERWRITE DIRECTORY '/tmp/count'
 SELECT COUNT(1) FROM u_data
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions

2013-05-16 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659785#comment-13659785
 ] 

Gopal V commented on HIVE-4486:
---

I have already run all of tests in ql/ against svn (Wed May 8) already.

Will run all tests in some time  report back.

 FetchOperator slows down SMB map joins by 50% when there are many partitions
 

 Key: HIVE-4486
 URL: https://issues.apache.org/jira/browse/HIVE-4486
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
 Environment: Ubuntu LXC 12.10
Reporter: Gopal V
Priority: Minor
 Attachments: HIVE-4486.patch, smb-profile.html


 While looking at log files for SMB joins in hive, it was noticed that the 
 actual join op didn't show up as a significant fraction of the time spent. 
 Most of the time was spent parsing configuration files.
 To confirm, I put log lines in the HiveConf constructor and eventually made 
 the following edit to the code
 {code}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
 HiveException {
 * @return list of file status entries
 */
private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
 IOException {
 -HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
 -boolean recursive = 
 hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
 +boolean recursive = false;
  if (!recursive) {
return fs.listStatus(p);
  }
 {code}
 And re-ran my query to compare timings.
 || ||Before||After||
 |Cumulative CPU| 731.07 sec|386.0 sec|
 |Total time | 347.66 seconds | 218.855 seconds | 
 |
 The query used was 
 {code}INSERT OVERWRITE LOCAL DIRECTORY
 '/grid/0/smb/'
 select inv_item_sk
 from
  inventory inv
  join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk)
 limit 10
 ;
 {code}
 On a scale=2 tpcds data-set, where both store_sales  inventory are bucketed 
 into 4 buckets, with store_sales split into 7 partitions and inventory into 
 261 partitions.
 78% of all CPU time was spent within new HiveConf(). The yourkit profiler 
 runs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: admin rights for Hive jira

2013-05-16 Thread Edward Capriolo

Thanks. I will only use it to correct my typo's. I tend to make many of
them. :)


On Thu, May 16, 2013 at 11:56 AM, Owen O'Malley omal...@apache.org wrote:

 On Thu, May 16, 2013 at 7:58 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  While you guys are on the topic, I am on the pmc and I do not think I can
  edit my own comments.


 With the Hadoop jira security model, which is what Hive's jira uses, only
 admins can edit comments. Now that Carl granted you admin rights, you can
 edit comments. Use it judiciously, especially if others have responded to
 your comment since it can make the conversation difficult to follow.

 -- Owen

[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests

2013-05-16 Thread Tony Murphy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4553:
--

Attachment: HIVE-4160.patch

 Column Column, and Column Scalar vectorized execution tests
 ---

 Key: HIVE-4553
 URL: https://issues.apache.org/jira/browse/HIVE-4553
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4553.patch


 review board review: https://reviews.apache.org/r/11133/
 This patch adds Column Column, and Column Scalar vectorized execution tests. 
 These tests are generated in parallel with the vectorized expressions. The 
 tests focus is on validating the column vector and the vectorized row batch 
 metadata regarding nulls, repeating, and selection.
 Overview of Changes:
 CodeGen.java:
 + joinPath, getCamelCaseType, readFile and writeFile made static for use in 
 TestCodeGen.java.
 + filter types now specify null as their output type rather than doesn't 
 matter to make detection for test generation easier.
 + support for test generation added.
 TestCodeGen.java  Templates: 
  TestClass.txt
  TestColumnColumnFilterVectorExpressionEvaluation.txt,
  TestColumnColumnOperationVectorExpressionEvaluation.txt,
  TestColumnScalarFilterVectorExpressionEvaluation.txt,
  TestColumnScalarOperationVectorExpressionEvaluation.txt
 +This class is mutable and maintains a hashmap of TestSuiteClassName to test 
 cases. The tests cases are added over the course of vectorized expressions 
 class generation, with test classes being outputted at the end. For each 
 column vector (inputs and/or outputs) a matrix of pairwise covering Booleans 
 is used to generate test cases across nulls and repeating dimensions. Based 
 on the input column vector(s) nulls and repeating states the states of the 
 output column vector (if there is one) is validated, along with the null 
 vector. For filter operations the selection vector is validated against the 
 generated data. Each template corresponds to a class representing a test 
 suite.
 VectorizedRowGroupUtil.java
 +added methods generateLongColumnVector and generateDoubleColumnVector for 
 generating the respective column vectors with optional nulls and/or repeating 
 values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests

2013-05-16 Thread Tony Murphy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4553:
--

Attachment: (was: HIVE-4160.patch)

 Column Column, and Column Scalar vectorized execution tests
 ---

 Key: HIVE-4553
 URL: https://issues.apache.org/jira/browse/HIVE-4553
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4553.patch


 review board review: https://reviews.apache.org/r/11133/
 This patch adds Column Column, and Column Scalar vectorized execution tests. 
 These tests are generated in parallel with the vectorized expressions. The 
 tests focus is on validating the column vector and the vectorized row batch 
 metadata regarding nulls, repeating, and selection.
 Overview of Changes:
 CodeGen.java:
 + joinPath, getCamelCaseType, readFile and writeFile made static for use in 
 TestCodeGen.java.
 + filter types now specify null as their output type rather than doesn't 
 matter to make detection for test generation easier.
 + support for test generation added.
 TestCodeGen.java  Templates: 
  TestClass.txt
  TestColumnColumnFilterVectorExpressionEvaluation.txt,
  TestColumnColumnOperationVectorExpressionEvaluation.txt,
  TestColumnScalarFilterVectorExpressionEvaluation.txt,
  TestColumnScalarOperationVectorExpressionEvaluation.txt
 +This class is mutable and maintains a hashmap of TestSuiteClassName to test 
 cases. The tests cases are added over the course of vectorized expressions 
 class generation, with test classes being outputted at the end. For each 
 column vector (inputs and/or outputs) a matrix of pairwise covering Booleans 
 is used to generate test cases across nulls and repeating dimensions. Based 
 on the input column vector(s) nulls and repeating states the states of the 
 output column vector (if there is one) is validated, along with the null 
 vector. For filter operations the selection vector is validated against the 
 generated data. Each template corresponds to a class representing a test 
 suite.
 VectorizedRowGroupUtil.java
 +added methods generateLongColumnVector and generateDoubleColumnVector for 
 generating the respective column vectors with optional nulls and/or repeating 
 values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: Column Column, and Column Scalar vectorized execution tests

2013-05-16 Thread tony murphy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11133/
---

(Updated May 16, 2013, 6:47 p.m.)


Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and 
Remus Rusanu.


Changes
---

updated based on review feedback.


Description
---

This patch adds Column Column, and Column Scalar vectorized execution tests. 
These tests are generated in parallel with the vectorized expressions. The 
tests focus is on validating the column vector and the vectorized row batch 
metadata regarding nulls, repeating, and selection.

Overview of Changes:

CodeGen.java:
+ joinPath, getCamelCaseType, readFile and writeFile made static for use in 
TestCodeGen.java.
+ filter types now specify null as their output type rather than doesn't 
matter to make detection for test generation easier.
+ support for test generation added.

TestCodeGen.java  Templates: 
 TestClass.txt
 TestColumnColumnFilterVectorExpressionEvaluation.txt,
 TestColumnColumnOperationVectorExpressionEvaluation.txt,
 TestColumnScalarFilterVectorExpressionEvaluation.txt,
 TestColumnScalarOperationVectorExpressionEvaluation.txt
+This class is mutable and maintains a hashmap of TestSuiteClassName to test 
cases. The tests cases are added over the course of vectorized expressions 
class generation, with test classes being outputted at the end. For each column 
vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used 
to generate test cases across nulls and repeating dimensions. Based on the 
input column vector(s) nulls and repeating states the states of the output 
column vector (if there is one) is validated, along with the null vector. For 
filter operations the selection vector is validated against the generated data. 
Each template corresponds to a class representing a test suite.

VectorizedRowGroupUtil.java
+added methods generateLongColumnVector and generateDoubleColumnVector for 
generating the respective column vectors with optional nulls and/or repeating 
values.


This addresses bug HIVE-4553.
https://issues.apache.org/jira/browse/HIVE-4553


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
 53d9a7a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java
 8a07567 

Diff: https://reviews.apache.org/r/11133/diff/


Testing
---

generated tests, and ran them.


Thanks,

tony murphy

[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests

2013-05-16 Thread Tony Murphy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4553:
--

Attachment: HIVE-4553 (2).patch

 Column Column, and Column Scalar vectorized execution tests
 ---

 Key: HIVE-4553
 URL: https://issues.apache.org/jira/browse/HIVE-4553
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4553 (2).patch, HIVE-4553.patch


 review board review: https://reviews.apache.org/r/11133/
 This patch adds Column Column, and Column Scalar vectorized execution tests. 
 These tests are generated in parallel with the vectorized expressions. The 
 tests focus is on validating the column vector and the vectorized row batch 
 metadata regarding nulls, repeating, and selection.
 Overview of Changes:
 CodeGen.java:
 + joinPath, getCamelCaseType, readFile and writeFile made static for use in 
 TestCodeGen.java.
 + filter types now specify null as their output type rather than doesn't 
 matter to make detection for test generation easier.
 + support for test generation added.
 TestCodeGen.java  Templates: 
  TestClass.txt
  TestColumnColumnFilterVectorExpressionEvaluation.txt,
  TestColumnColumnOperationVectorExpressionEvaluation.txt,
  TestColumnScalarFilterVectorExpressionEvaluation.txt,
  TestColumnScalarOperationVectorExpressionEvaluation.txt
 +This class is mutable and maintains a hashmap of TestSuiteClassName to test 
 cases. The tests cases are added over the course of vectorized expressions 
 class generation, with test classes being outputted at the end. For each 
 column vector (inputs and/or outputs) a matrix of pairwise covering Booleans 
 is used to generate test cases across nulls and repeating dimensions. Based 
 on the input column vector(s) nulls and repeating states the states of the 
 output column vector (if there is one) is validated, along with the null 
 vector. For filter operations the selection vector is validated against the 
 generated data. Each template corresponds to a class representing a test 
 suite.
 VectorizedRowGroupUtil.java
 +added methods generateLongColumnVector and generateDoubleColumnVector for 
 generating the respective column vectors with optional nulls and/or repeating 
 values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4467) HiveConnection does not handle failures correctly

2013-05-16 Thread Thiruvel Thirumoolan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659904#comment-13659904
 ] 

Thiruvel Thirumoolan commented on HIVE-4467:


[~cwsteinbach] Does the updated patch look good?

 HiveConnection does not handle failures correctly
 -

 Key: HIVE-4467
 URL: https://issues.apache.org/jira/browse/HIVE-4467
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-4467_1.patch, HIVE-4467.patch


 HiveConnection uses Utils.verifySuccess* routines to check if there is any 
 error from the server side. This is not handled well. In 
 Utils.verifySuccess() when withInfo is 'false', the condition evaluates to 
 'false' and no SQLexception is thrown even though there could be a problem on 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4467) HiveConnection does not handle failures correctly

2013-05-16 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659916#comment-13659916
 ] 

Carl Steinbach commented on HIVE-4467:
--

Changes look good to me. +1.

 HiveConnection does not handle failures correctly
 -

 Key: HIVE-4467
 URL: https://issues.apache.org/jira/browse/HIVE-4467
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-4467_1.patch, HIVE-4467.patch


 HiveConnection uses Utils.verifySuccess* routines to check if there is any 
 error from the server side. This is not handled well. In 
 Utils.verifySuccess() when withInfo is 'false', the condition evaluates to 
 'false' and no SQLexception is thrown even though there could be a problem on 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999

2013-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659934#comment-13659934
 ] 

Hudson commented on HIVE-4440:
--

Integrated in Hive-trunk-hadoop2 #199 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/199/])
HIVE-4440 SMB Operator spills to disk like it's 1999 (Gunther Hagleitner via
omalley) (Revision 1483084)

 Result = FAILURE
omalley : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483084
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java


 SMB Operator spills to disk like it's 1999
 --

 Key: HIVE-4440
 URL: https://issues.apache.org/jira/browse/HIVE-4440
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.12.0

 Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch


 I was recently looking into some performance issue with a query that used SMB 
 join and was running really slow. Turns out that the SMB join by default 
 caches only 100 values per key before spilling to disk. That seems overly 
 conservative to me. Changing the parameter resulted in a ~5x speedup - quite 
 significant.
 The parameter is: hive.mapjoin.bucket.cache.size
 Which right now is only used the SMB Operator as far as I can tell.
 The parameter was introduced originally (3 yrs ago) for the map join operator 
 (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in 
 a different context though where you had to avoid running out of memory with 
 the cached hash table in the same process, I think.
 Two things I'd like to propose:
 a) Rename it to what it does: hive.smbjoin.cache.rows
 b) Set it to something less restrictive: 1
 If you string together a 5 table smb join with a map join and a map-side 
 group by aggregation you might still run out of memory, but the renamed 
 parameter should be easier to find and reduce. For most queries, I would 
 think that 1 is still a reasonable number to cache (On the reduce side we 
 use 25000 for shuffle joins).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions

2013-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659935#comment-13659935
 ] 

Hudson commented on HIVE-4550:
--

Integrated in Hive-trunk-hadoop2 #199 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/199/])
HIVE-4550 local_mapred_error_cache fails on some hadoop versions (Gunther 
Hagleitner via omalley) (Revision 1483124)

 Result = FAILURE
omalley : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1483124
Files : 
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
* /hive/trunk/ql/src/test/results/clientnegative/local_mapred_error_cache.q.out


 local_mapred_error_cache fails on some hadoop versions
 --

 Key: HIVE-4550
 URL: https://issues.apache.org/jira/browse/HIVE-4550
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4550.1.patch


 I've tested it manually on the upcoming 1.3 version (branch 1).
 We do mask job_* ids, but not job_local* ids. The fix is to extend this to 
 both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4543) Broken link in HCat 0.5 doc (Reader and Writer Interfaces)

2013-05-16 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-4543:
-

Status: Open  (was: Patch Available)

Lefty, the change looks good.  But could you please attach the changed .xml 
file rather than the .html and .pdf files.  Then I can rebuild the docs from 
that and check in the changes.

 Broken link in HCat 0.5 doc (Reader and Writer Interfaces)
 --

 Key: HIVE-4543
 URL: https://issues.apache.org/jira/browse/HIVE-4543
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Lefty Leverenz
Assignee: Lefty Leverenz
Priority: Minor
 Attachments: HIVE-4543.1.patch, HIVE-4543.2.patch, readerwriter.html, 
 readerwriter.pdf


 Due to HCatalog's move from the incubator to Hive, a link to 
 TestReaderWriter.java is broken at the end of the Reader and Writer 
 Interfaces doc for HCat 0.5 
 ([here|http://hive.apache.org/docs/hcat_r0.5.0/readerwriter.html#Complete+Example+Program]).
   This should be fixed in the html and pdf files.
 Thanks to Himanshu Bari for pointing this out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #147

2013-05-16 Thread Apache Jenkins Server

See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/

--
[...truncated 62234 lines...]
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2013-05-16 13:50:50,187 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] Execution completed successfully
[junit] Mapred Local Task Succeeded . Convert the Join into MapJoin
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/localscratchdir/hive_2013-05-16_13-50-46_848_6460139522905130875/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/tmp/hive_job_log_jenkins_201305161350_1297667893.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Copying file: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] Table default.testhivedrivertable stats: [num_partitions: 0, 
num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/localscratchdir/hive_2013-05-16_13-50-51_771_4365363466253624657/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/localscratchdir/hive_2013-05-16_13-50-51_771_4365363466253624657/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/147/artifact/hive/build/service/tmp/hive_job_log_jenkins_201305161350_2061596878.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable

[ANNOUNCE] Apache Hive 0.11.0 Released

2013-05-16 Thread Owen O'Malley

The Apache Hive team is proud to announce the the release of Apache
Hive version 0.11.0.

The Apache Hive data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop, it provides:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS or in other
  data storage systems such as Apache HBase

* Query execution via MapReduce

For Hive release details and downloads, please visit:
http://hive.apache.org/releases.html

Hive 0.11.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323587styleName=HtmlprojectId=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team

Re: [ANNOUNCE] Apache Hive 0.11.0 Released

2013-05-16 Thread Dean Wampler

Congratulations!

On Thu, May 16, 2013 at 4:19 PM, Owen O'Malley omal...@apache.org wrote:

 The Apache Hive team is proud to announce the the release of Apache
 Hive version 0.11.0.

 The Apache Hive data warehouse software facilitates querying and
 managing large datasets residing in distributed storage. Built on top
 of Apache Hadoop, it provides:

 * Tools to enable easy data extract/transform/load (ETL)

 * A mechanism to impose structure on a variety of data formats

 * Access to files stored either directly in Apache HDFS or in other
   data storage systems such as Apache HBase

 * Query execution via MapReduce

 For Hive release details and downloads, please visit:
 http://hive.apache.org/releases.html

 Hive 0.11.0 Release Notes are available here:


 https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323587styleName=HtmlprojectId=12310843

 We would like to thank the many contributors who made this release
 possible.

 Regards,

 The Apache Hive Team




-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com

[jira] [Updated] (HIVE-4572) ColumnPruner cannot preserve RS key columns corresponding to un-selected join keys in columnExprMap

2013-05-16 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4572:
---

Attachment: HIVE-4572.1.patch.txt

Add a patch. In 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory.pruneReduceSinkOperator,
 RS key columns will not be removed columnExprMap. Need to test if this change 
works for HIVE-2206.

 ColumnPruner cannot preserve RS key columns corresponding to un-selected join 
 keys in columnExprMap
 ---

 Key: HIVE-4572
 URL: https://issues.apache.org/jira/browse/HIVE-4572
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4572.1.patch.txt, HIVE-4572.replay.patch


 For a RS of a join operator, if the join key corresponding to this RS does 
 not appear in the SELECT clause, ColumnPruner will drop the entry of this 
 column in colExprMap. 
 Example:
 {code}
 SELECT x.key FROM src1 x JOIN src y ON (x.key = y.key);
 {\code}
 Before CP,
 {code}
 colExprMap of RS corresponding to x: {VALUE._col3=Column[INPUT__FILE__NAME], 
 VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], 
 VALUE._col0=Column[key]};
 colExprMap of RS corresponding to y: {VALUE._col3=Column[INPUT__FILE__NAME], 
 VALUE._col2=Column[BLOCK__OFFSET__INSIDE__FILE], VALUE._col1=Column[value], 
 VALUE._col0=Column[key]}.
 {\code}
 After CP,
 {code}
 colExprMap of RS corresponding to x: {VALUE._col0=Column[key]};
 colExprMap of RS corresponding to y: {}.
 {\code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Moving HCatalog docs to wiki

2013-05-16 Thread Carl Steinbach

Hi Lefty,


 Does my outline for the Hive wiki fit the bill?  Or should HCatalog docs be
 parceled out among existing Hive topics?  (Installation in Admin docs, and
 so on.)  If necessary, an HCatalog overview page could show where to find
 everything.


My personal preference is for parceling the HCatalog docs out among the
existing Hive topics along with an HCatalog overview page.

Thanks.

Carl

[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests

2013-05-16 Thread Tony Murphy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4553:
--

Attachment: HIVE-4553 (3).patch

 Column Column, and Column Scalar vectorized execution tests
 ---

 Key: HIVE-4553
 URL: https://issues.apache.org/jira/browse/HIVE-4553
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4553 (2).patch, HIVE-4553 (3).patch, HIVE-4553.patch


 review board review: https://reviews.apache.org/r/11133/
 This patch adds Column Column, and Column Scalar vectorized execution tests. 
 These tests are generated in parallel with the vectorized expressions. The 
 tests focus is on validating the column vector and the vectorized row batch 
 metadata regarding nulls, repeating, and selection.
 Overview of Changes:
 CodeGen.java:
 + joinPath, getCamelCaseType, readFile and writeFile made static for use in 
 TestCodeGen.java.
 + filter types now specify null as their output type rather than doesn't 
 matter to make detection for test generation easier.
 + support for test generation added.
 TestCodeGen.java  Templates: 
  TestClass.txt
  TestColumnColumnFilterVectorExpressionEvaluation.txt,
  TestColumnColumnOperationVectorExpressionEvaluation.txt,
  TestColumnScalarFilterVectorExpressionEvaluation.txt,
  TestColumnScalarOperationVectorExpressionEvaluation.txt
 +This class is mutable and maintains a hashmap of TestSuiteClassName to test 
 cases. The tests cases are added over the course of vectorized expressions 
 class generation, with test classes being outputted at the end. For each 
 column vector (inputs and/or outputs) a matrix of pairwise covering Booleans 
 is used to generate test cases across nulls and repeating dimensions. Based 
 on the input column vector(s) nulls and repeating states the states of the 
 output column vector (if there is one) is validated, along with the null 
 vector. For filter operations the selection vector is validated against the 
 generated data. Each template corresponds to a class representing a test 
 suite.
 VectorizedRowGroupUtil.java
 +added methods generateLongColumnVector and generateDoubleColumnVector for 
 generating the respective column vectors with optional nulls and/or repeating 
 values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Jenkins build is back to normal : Hive-0.9.1-SNAPSHOT-h0.21 #374

2013-05-16 Thread Apache Jenkins Server

See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/374/

Re: Review Request: Column Column, and Column Scalar vectorized execution tests

2013-05-16 Thread Eric Hanson


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11133/#review20671
---



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt
https://reviews.apache.org/r/11133/#comment42698

either add a noNulls check or put a comment that you have filled the isNull 
array with false if there are no nulls so you know this is safe. Please do this 
in other templates too. In general it is not safe to look at isNull entries if 
noNulls is true.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt
https://reviews.apache.org/r/11133/#comment42685

Still some style issues. Please run ant checkstyle on the output of the 
template generation and update the templates to get rid of the issues. E.g. 
}while



- Eric Hanson


On May 16, 2013, 6:47 p.m., tony murphy wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11133/
 ---
 
 (Updated May 16, 2013, 6:47 p.m.)
 
 
 Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, 
 and Remus Rusanu.
 
 
 Description
 ---
 
 This patch adds Column Column, and Column Scalar vectorized execution tests. 
 These tests are generated in parallel with the vectorized expressions. The 
 tests focus is on validating the column vector and the vectorized row batch 
 metadata regarding nulls, repeating, and selection.
 
 Overview of Changes:
 
 CodeGen.java:
 + joinPath, getCamelCaseType, readFile and writeFile made static for use in 
 TestCodeGen.java.
 + filter types now specify null as their output type rather than doesn't 
 matter to make detection for test generation easier.
 + support for test generation added.
 
 TestCodeGen.java  Templates: 
  TestClass.txt
  TestColumnColumnFilterVectorExpressionEvaluation.txt,
  TestColumnColumnOperationVectorExpressionEvaluation.txt,
  TestColumnScalarFilterVectorExpressionEvaluation.txt,
  TestColumnScalarOperationVectorExpressionEvaluation.txt
 +This class is mutable and maintains a hashmap of TestSuiteClassName to test 
 cases. The tests cases are added over the course of vectorized expressions 
 class generation, with test classes being outputted at the end. For each 
 column vector (inputs and/or outputs) a matrix of pairwise covering Booleans 
 is used to generate test cases across nulls and repeating dimensions. Based 
 on the input column vector(s) nulls and repeating states the states of the 
 output column vector (if there is one) is validated, along with the null 
 vector. For filter operations the selection vector is validated against the 
 generated data. Each template corresponds to a class representing a test 
 suite.
 
 VectorizedRowGroupUtil.java
 +added methods generateLongColumnVector and generateDoubleColumnVector for 
 generating the respective column vectors with optional nulls and/or repeating 
 values.
 
 
 This addresses bug HIVE-4553.
 https://issues.apache.org/jira/browse/HIVE-4553
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
  53d9a7a 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java
  PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java
  8a07567 
 
 Diff: https://reviews.apache.org/r/11133/diff/

[jira] [Updated] (HIVE-4568) Beeline needs to support resolving variables

2013-05-16 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-4568:
--

Attachment: HIVE-4568.patch

 Beeline needs to support resolving variables
 

 Key: HIVE-4568
 URL: https://issues.apache.org/jira/browse/HIVE-4568
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Attachments: HIVE-4568.patch


 Beeline currently doesn't support variable (system, env, etc) substitution as 
 hive client does. Supporting this feature will certainly make it more usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4568) Beeline needs to support resolving variables

2013-05-16 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-4568:
--

Fix Version/s: 0.11.1
   Status: Patch Available  (was: Open)

Attached patch is to address the issue.

 Beeline needs to support resolving variables
 

 Key: HIVE-4568
 URL: https://issues.apache.org/jira/browse/HIVE-4568
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.11.1

 Attachments: HIVE-4568.patch


 Beeline currently doesn't support variable (system, env, etc) substitution as 
 hive client does. Supporting this feature will certainly make it more usable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

47 matches

Mail list logo