date:20130805


[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729135#comment-13729135
 ] 

Hive QA commented on HIVE-4870:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595856/HIVE-4870.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2759 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/303/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/303/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4970) BinaryConverter does not respect nulls


[ 
https://issues.apache.org/jira/browse/HIVE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729370#comment-13729370
 ] 

Hudson commented on HIVE-4970:
--

ABORTED: Integrated in Hive-trunk-h0.21 #2245 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2245/])
HIVE-4970 BinaryConverter does not respect null (Mark Wagner via egc)

Submitted by: Mark Wagner   
Reviewed by: Edward Capriolo (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510263)
* /hive/trunk/ql/src/test/queries/clientpositive/ba_table_udfs.q
* /hive/trunk/ql/src/test/results/clientpositive/ba_table_udfs.q.out
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java


 BinaryConverter does not respect nulls
 --

 Key: HIVE-4970
 URL: https://issues.apache.org/jira/browse/HIVE-4970
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0, 0.12.0
Reporter: Mark Wagner
Assignee: Mark Wagner
 Fix For: 0.12.0

 Attachments: HIVE-4970.1.patch, HIVE-4970.2.patch


 Right now, the BinaryConverter in PrimitiveObjectInspectorConverter does not 
 handle null values the same as the other converters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4879) Window functions that imply order can only be registered at compile time


[ 
https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729369#comment-13729369
 ] 

Hudson commented on HIVE-4879:
--

ABORTED: Integrated in Hive-trunk-h0.21 #2245 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2245/])
Hive-4879 Window function that imply order can only be registered at compile 
time (Edward Capriolo)

Reviewed by:Brock Noland (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510269)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionDescription.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFFirstValue.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLastValue.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLead.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestFunctionRegistry.java


 Window functions that imply order can only be registered at compile time
 

 Key: HIVE-4879
 URL: https://issues.apache.org/jira/browse/HIVE-4879
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.11.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.12.0

 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, 
 HIVE-4879.3.patch.txt, HIVE-4879.4.patch.txt


 Adding an annotation for impliesOrder

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Access to trigger jobs on jenkins

2013-08-05 Thread Brock Noland

Hi,

Are you looking to trigger the pre-commit builds?

Unfortunately to trigger *regular* builds you'd need an Apache username
according the Apache Infra Jenkins http://wiki.apache.org/general/Jenkinspage.

Brock


On Sun, Aug 4, 2013 at 1:37 PM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

 Hello,

 I was wondering if it is possible to get access to be able to trigger jobs
 on the jenkins server? Or is that access limited to committers?

 Thanks,

 --
 Swarnim




-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Commented] (HIVE-4992) add ability to skip javadoc during build


[ 
https://issues.apache.org/jira/browse/HIVE-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729490#comment-13729490
 ] 

Brock Noland commented on HIVE-4992:


Hey this looks good! I think that hcat generates javadoc in a separate 
build.xml file. Can we add this to the hcat build as well?

 add ability to skip javadoc during build
 

 Key: HIVE-4992
 URL: https://issues.apache.org/jira/browse/HIVE-4992
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HIVE-4992.D11967.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Access to trigger jobs on jenkins

2013-08-05 Thread kulkarni.swar...@gmail.com

Hi Brock,

Yes I was looking to trigger the pre-commit builds without having to
check-in a new patch everytime to auto-trigger them. I assumed they were
similar to the *regular* builds?


On Mon, Aug 5, 2013 at 7:43 AM, Brock Noland br...@cloudera.com wrote:

 Hi,

 Are you looking to trigger the pre-commit builds?

 Unfortunately to trigger *regular* builds you'd need an Apache username
 according the Apache Infra Jenkins http://wiki.apache.org/general/Jenkins
 page.

 Brock


 On Sun, Aug 4, 2013 at 1:37 PM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

  Hello,
 
  I was wondering if it is possible to get access to be able to trigger
 jobs
  on the jenkins server? Or is that access limited to committers?
 
  Thanks,
 
  --
  Swarnim
 



 --
 Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org




-- 
Swarnim

Re: Access to trigger jobs on jenkins

2013-08-05 Thread Brock Noland

Hi,

The precommit builds are similar to regular builds but I was curious as I
agree it'd be nice to allow people to re-trigger the precommit build who
may not have access to the Apache Jenkins.  Let's think about that.  For
now I'd just re-upload the same patch.

Brock


On Mon, Aug 5, 2013 at 8:35 AM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

 Hi Brock,

 Yes I was looking to trigger the pre-commit builds without having to
 check-in a new patch everytime to auto-trigger them. I assumed they were
 similar to the *regular* builds?


 On Mon, Aug 5, 2013 at 7:43 AM, Brock Noland br...@cloudera.com wrote:

  Hi,
 
  Are you looking to trigger the pre-commit builds?
 
  Unfortunately to trigger *regular* builds you'd need an Apache username
  according the Apache Infra Jenkins 
 http://wiki.apache.org/general/Jenkins
  page.
 
  Brock
 
 
  On Sun, Aug 4, 2013 at 1:37 PM, kulkarni.swar...@gmail.com 
  kulkarni.swar...@gmail.com wrote:
 
   Hello,
  
   I was wondering if it is possible to get access to be able to trigger
  jobs
   on the jenkins server? Or is that access limited to committers?
  
   Thanks,
  
   --
   Swarnim
  
 
 
 
  --
  Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
 



 --
 Swarnim




-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Created] (HIVE-4997) HCatalog doesn't allow multiple input tables

Daniel Intskirveli created HIVE-4997:


 Summary: HCatalog doesn't allow multiple input tables
 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0


HCatInputFormat does not allow reading from multiple hive tables in the same 
MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2


 [ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4388:
---

Attachment: HIVE-4388.patch

TestE2EScenarios was not using a Shim. I have fixed this.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Affects Version/s: 0.12.0
   Status: Patch Available  (was: Open)

Patch includes a new class, HCatMultipleInputs, which supports multiple table 
names, database names, partition filters, and mapper classes. 

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Status: Open  (was: Patch Available)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Attachment: HIVE-4997.patch

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Status: Patch Available  (was: Open)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2


[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729596#comment-13729596
 ] 

Hive QA commented on HIVE-4388:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596124/HIVE-4388.patch

{color:green}SUCCESS:{color} +1 2759 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/304/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/304/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Problem loading new UDTF in local hive copy

2013-08-05 Thread nikolaus . stahl


Hi,

I'm trying to compile hive with a new UDTF and have been following the  
wiki instruction  
(https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy).


I've added my new function to the function registry and have  
successfully updated show_functions.q.out. However, when I recompile  
and start my local copy of hive with build/dist/bin/hive the show  
functions; command is still not listing my new function. Any thoughts  
on what I'm missing? Sorry if this is a naive question.


Thanks for your help,
Niko

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Attachment: HIVE-4997.patch1

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.patch, HIVE-4997.patch1


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Status: Open  (was: Patch Available)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.patch, HIVE-4997.patch1


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Status: Patch Available  (was: Open)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.patch, HIVE-4997.patch1


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2


 [ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4388:
---

Attachment: HIVE-4388.patch

Simplified one ant condition.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4879) Window functions that imply order can only be registered at compile time


[ 
https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729670#comment-13729670
 ] 

Hudson commented on HIVE-4879:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #328 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/328/])
Hive-4879 Window function that imply order can only be registered at compile 
time (Edward Capriolo)

Reviewed by:Brock Noland (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510269)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionDescription.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFFirstValue.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLastValue.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLead.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestFunctionRegistry.java


 Window functions that imply order can only be registered at compile time
 

 Key: HIVE-4879
 URL: https://issues.apache.org/jira/browse/HIVE-4879
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.11.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.12.0

 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, 
 HIVE-4879.3.patch.txt, HIVE-4879.4.patch.txt


 Adding an annotation for impliesOrder

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4970) BinaryConverter does not respect nulls


[ 
https://issues.apache.org/jira/browse/HIVE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729671#comment-13729671
 ] 

Hudson commented on HIVE-4970:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #328 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/328/])
HIVE-4970 BinaryConverter does not respect null (Mark Wagner via egc)

Submitted by: Mark Wagner   
Reviewed by: Edward Capriolo (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510263)
* /hive/trunk/ql/src/test/queries/clientpositive/ba_table_udfs.q
* /hive/trunk/ql/src/test/results/clientpositive/ba_table_udfs.q.out
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorConverter.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorConverters.java


 BinaryConverter does not respect nulls
 --

 Key: HIVE-4970
 URL: https://issues.apache.org/jira/browse/HIVE-4970
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0, 0.12.0
Reporter: Mark Wagner
Assignee: Mark Wagner
 Fix For: 0.12.0

 Attachments: HIVE-4970.1.patch, HIVE-4970.2.patch


 Right now, the BinaryConverter in PrimitiveObjectInspectorConverter does not 
 handle null values the same as the other converters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4683) fix coverage org.apache.hadoop.hive.cli


[ 
https://issues.apache.org/jira/browse/HIVE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729669#comment-13729669
 ] 

Hudson commented on HIVE-4683:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #328 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/328/])
HIVE-4683 : fix coverage org.apache.hadoop.hive.cli (Aleksey Gorshkov via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510346)
* /hive/trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java
* /hive/trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
* /hive/trunk/cli/src/test/org/apache/hadoop/hive/cli/TestCliDriverMethods.java
* /hive/trunk/cli/src/test/org/apache/hadoop/hive/cli/TestCliSessionState.java
* /hive/trunk/cli/src/test/org/apache/hadoop/hive/cli/TestOptionsProcessor.java
* /hive/trunk/cli/src/test/org/apache/hadoop/hive/cli/TestRCFileCat.java


 fix coverage org.apache.hadoop.hive.cli
 ---

 Key: HIVE-4683
 URL: https://issues.apache.org/jira/browse/HIVE-4683
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 0.12.0

 Attachments: HIVE-4683-branch-0.10.patch, 
 HIVE-4683-branch-0.10-v1.patch, HIVE-4683-branch-0.11-v1.patch, 
 HIVE-4683-trunk.patch, HIVE-4683-trunk-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-05 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729665#comment-13729665
 ] 

Edward Capriolo commented on HIVE-4964:
---

With a quick look.

1) Your not using the correct formatting rules for the project. We can not 
accept code that does not match the coding conventions

{code}
+while (pItr.hasNext())
+{
+  Object oRow = pItr.next();
+  forward(oRow, outputObjInspector);
+}
{code}

2) The implementing class should not be on the left side of the equals. Use 
List not ArrayList when possible. 
{code}
 ArrayListObjectInspector fieldOIs = new ArrayListObjectInspector();
{code}

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Tez branch and tez based patches

2013-08-05 Thread Alan Gates

Which talk are you referencing here?  AFAIK all the Hive code we've written is 
being pushed back into the Tez branch, so you should be able to see it there.

Alan.

On Jul 29, 2013, at 9:02 PM, Edward Capriolo wrote:

 At ~25:00
 
 There is a working prototype of hive which is using tez as the targeted
 runtime
 
 Can I get a look at that code? Is it on github?
 
 Edward
 
 
 On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote:
 
 Answers to some of your questions inlined.
 
 Alan.
 
 On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:
 
 There are some points I want to bring up. First, I am on the PMC. Here is
 something I find relevant:
 
 http://www.apache.org/foundation/how-it-works.html
 
 --
 
 The role of the PMC from a Foundation perspective is oversight. The main
 role of the PMC is not code and not coding - but to ensure that all legal
 issues are addressed, that procedure is followed, and that each and every
 release is the product of the community as a whole. That is key to our
 litigation protection mechanisms.
 
 Secondly the role of the PMC is to further the long term development and
 health of the community as a whole, and to ensure that balanced and wide
 scale peer review and collaboration does happen. Within the ASF we worry
 about any community which centers around a few individuals who are
 working
 virtually uncontested. We believe that this is detrimental to quality,
 stability, and robustness of both code and long term social structures.
 
 
 
 
 https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different
 
 -
 
 All other decisions happen on the dev list, discussions on the private
 list
 are kept to a minimum.
 
 If it didn't happen on the dev list, it didn't happen - which leads to:
 
 a) Elections of committers and PMC members are published on the dev list
 once finalized.
 
 b) Out-of-band discussions (IRC etc.) are summarized on the dev list as
 soon as they have impact on the project, code or community.
 -
 
 https://issues.apache.org/jira/browse/HIVE-4660 ironically titled Let
 their be Tez has not be +1 ed by any committer. It was never discussed
 on
 the dev or the user list (as far as I can tell).
 
 As all JIRA creations and updates are sent to dev@hive, creating a JIRA
 is de facto posting to the list.
 
 
 As a PMC member I feel we need more discussion on Tez on the dev list
 along
 with a wiki-fied design document. Topics of discussion should include:
 
 I talked with Gunther and he's working on posting a design doc on the
 wiki.  He has a PDF on the JIRA but he doesn't have write permissions yet
 on the wiki.
 
 
 1) What is tez?
 In Hadoop 2.0, YARN opens up the ability to have multiple execution
 frameworks in Hadoop.  Hadoop apps are no longer tied to MapReduce as the
 only execution option.  Tez is an effort to build an execution engine that
 is optimized for relational data processing, such as Hive and Pig.
 
 The biggest change here is to move away from only Map and Reduce as
 processing options and to allow alternate combinations of processing, such
 as map - reduce - reduce or tasks that take multiple inputs or shuffles
 that avoid sorting when it isn't needed.
 
 For a good intro to Tez, see Arun's presentation on it at the recent
 Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides
 http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212)
 
 2) How is tez different from oozie, http://code.google.com/p/hop/,
 http://cs.brown.edu/~backman/cmr.html , and other DAG and or streaming
 map
 reduce tools/frameworks? Why should we use this and not those?
 
 Oozie is a completely different thing.  Oozie is a workflow engine and a
 scheduler.  It's core competencies are the ability to coordinate workflows
 of disparate job types (MR, Pig, Hive, etc.) and to schedule them.  It is
 not intended as an execution engine for apps such as Pig and Hive.
 
 I am not familiar with these other engines, but the short answer is that
 Tez is built to work on YARN, which works well for Hive since it is tied to
 Hadoop.
 
 3) When can we expect the first tez release?
 I don't know, but I hope sometime this fall.
 
 
 4) How much effort is involved in integrating hive and tez?
 Covered in the design doc.
 
 
 5) Who is ready to commit to this effort?
 I'll let people speak for themselves on that one.
 
 
 6) can we expect this work to be done in one hive release?
 Unlikely.  Initial integration will be done in one release, but as Tez is
 a new project I expect it will be adding features in the future that Hive
 will want to take advantage of.
 
 
 In my opinion we should not start any work on this tez-hive until these
 questions are answered to the satisfaction of the hive developers.
 
 Can we change this to not commit patches?  We can't tell willing people
 not to work on it.

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Status: Open  (was: Patch Available)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: (was: HIVE-4870.patch)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Tez branch and tez based patches

2013-08-05 Thread Alan Gates


On Jul 29, 2013, at 9:53 PM, Edward Capriolo wrote:

 Also watched http://www.ustream.tv/recorded/36323173
 
 I definitely see the win in being able to stream inter-stage output.
 
 I see some cases where small intermediate results can be kept In memory.
 But I was somewhat under the impression that the map reduce spill settings
 kept stuff in memory, isn't that what spill settings are?

No.  MapReduce always writes shuffle data to local disk.  And intermediate 
results between MR jobs are always persisted to HDFS, as there's no other 
option.  When we talk of being able to keep intermediate results in memory we 
mean getting rid of both of these disk writes/reads when appropriate (meaning 
not always, there's a trade off between speed and error handling to be made 
here, see below for more details).

 
 There is a few bullet points that came up repeatedly that I do not follow:
 
 Something was said to the effect of Container reuse makes X faster.
 Hadoop has jvm reuse. Not following what the difference is here? Not
 everyone has a 10K node cluster.

Sharing JVMs across users is inherently insecure (we can't guarantee what code 
the first user left behind that may interfere with later users).  As I 
understand container re-use in Tez it constrains the re-use to one user for 
security reasons, but still avoids additional JVM start up costs.  But this is 
a question that the Tez guys could answer better on the Tez lists 
(d...@tez.incubator.apache.org)

 
 Joins in map reduce are hard Really? I mean some of them are I guess, but
 the typical join is very easy. Just shuffle by the join key. There was not
 really enough low level details here saying why joins are better in tez.

Join is not a natural operation in MapReduce.  MR gives you one input and one 
output.  You end up having to bend the rules to do have multiple inputs.  The 
idea here is that Tez can provide operators that naturally work with joins and 
other operations that don't fit the one input/one output model (eg unions, 
etc.).

 
 Chosing the number of maps and reduces is hard Really? I do not find it
 that hard, I think there are times when it's not perfect but I do not find
 it hard. The talk did not really offer anything here technical on how tez
 makes this better other then it could make it better.

Perhaps manual would be a better term here than hard.  In our experience it 
takes quite a bit of engineer trial and error to determine the optimal numbers. 
 This may be ok if you're going to invest the time once and then run the same 
query every day for 6 months.  But obviously it doesn't work for the ad hoc 
case.  Even in the batch case it's not optimal because every once and a while 
an engineer has to go back and re-optimize the query to deal with changing data 
sizes, data characteristics, etc.  We want the optimizer to handle this without 
human intervention.

 
 The presentations mentioned streaming data, how do two nodes stream data
 between a tasks and how it it reliable? If the sender or receiver dies does
 the entire process have to start again?

If the sender or receiver dies then the query has to be restarted from some 
previous point where data was persisted to disk.  The idea here is that speed 
vs error recovery trade offs should be made by the optimizer.  If the optimizer 
estimates that a query will complete in 5 seconds it can stream everything and 
if a node fails it just re-runs the whole query.  If it estimates that a 
particular phase of a query will run for an hour it can choose to persist the 
results to HDFS so that in the event of a failure downstream the long phase 
need not be re-run.  Again we want this to be done automatically by the system 
so the user doesn't need to control this level of detail.

 
 Again one of the talks implied there is a prototype out there that launches
 hive jobs into tez. I would like to see that, it might answer more
 questions then a power point, and I could profile some common queries.

As mentioned in a previous email afaik Gunther's pushed all these changes to 
the Tez branch in Hive.

Alan.

 
 Random late night thoughts over,
 Ed
 
 
 
 
 
 
 On Tue, Jul 30, 2013 at 12:02 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:
 
 At ~25:00
 
 There is a working prototype of hive which is using tez as the targeted
 runtime
 
 Can I get a look at that code? Is it on github?
 
 Edward
 
 
 On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote:
 
 Answers to some of your questions inlined.
 
 Alan.
 
 On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:
 
 There are some points I want to bring up. First, I am on the PMC. Here
 is
 something I find relevant:
 
 http://www.apache.org/foundation/how-it-works.html
 
 --
 
 The role of the PMC from a Foundation perspective is oversight. The main
 role of the PMC is not code and not coding - but to ensure that all
 legal
 issues are addressed, that procedure is followed, and that each and
 every

[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2

2013-08-05 Thread Swarnim Kulkarni (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-4388:
---

Component/s: HBase Handler

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 12827: HIVE-4611 - SMB joins fail based on bigtable selection policy.

2013-08-05 Thread Vikram Dixit Kumaraswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12827/
---

(Updated Aug. 5, 2013, 5:57 p.m.)


Review request for hive, Ashutosh Chauhan, Brock Noland, and Gunther Hagleitner.


Changes
---

Addressed Gunther's comments.


Bugs: HIVE-4611
https://issues.apache.org/jira/browse/HIVE-4611


Repository: hive-git


Description
---

SMB joins fail based on bigtable selection policy. The default setting for 
hive.auto.convert.sortmerge.join.bigtable.selection.policy will choose the big 
table as the one with largest average partition size. However, this can result 
in a query failing because this policy conflicts with the big table candidates 
chosen for outer joins. This policy should just be a tie breaker and not have 
the ultimate say in the choice of tables.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 12e9334 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java 
cc9de54 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AvgPartitionSizeBasedBigTableSelectorForAutoSMJ.java
 5320143 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/BigTableSelectorForAutoSMJ.java 
db5ff0f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LeftmostBigTableSelectorForAutoSMJ.java
 db3c9e7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java cd1b4ad 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/TableSizeBasedBigTableSelectorForAutoSMJ.java
 b882f87 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
 3071713 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
 e214807 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SortMergeJoinTaskDispatcher.java
 da5115b 
  ql/src/test/queries/clientnegative/auto_sortmerge_join_1.q c858254 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_15.q PRE-CREATION 
  ql/src/test/results/clientnegative/auto_sortmerge_join_1.q.out 0eddb69 
  ql/src/test/results/clientnegative/smb_bucketmapjoin.q.out 7a5b8c1 
  ql/src/test/results/clientpositive/auto_sortmerge_join_15.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/12827/diff/


Testing
---

All tests pass on hadoop 1.


Thanks,

Vikram Dixit Kumaraswamy

[jira] [Updated] (HIVE-4611) SMB joins fail based on bigtable selection policy.

2013-08-05 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4611:
-

Attachment: HIVE-4611.5.patch.txt

Addressed Gunther's comments.

 SMB joins fail based on bigtable selection policy.
 --

 Key: HIVE-4611
 URL: https://issues.apache.org/jira/browse/HIVE-4611
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.11.1

 Attachments: HIVE-4611.2.patch, HIVE-4611.3.patch, HIVE-4611.4.patch, 
 HIVE-4611.5.patch.txt, HIVE-4611.patch


 The default setting for 
 hive.auto.convert.sortmerge.join.bigtable.selection.policy will choose the 
 big table as the one with largest average partition size. However, this can 
 result in a query failing because this policy conflicts with the big table 
 candidates chosen for outer joins. This policy should just be a tie breaker 
 and not have the ultimate say in the choice of tables.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: HIVE-4870.patch

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4611) SMB joins fail based on bigtable selection policy.

2013-08-05 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729726#comment-13729726
 ] 

Vikram Dixit K commented on HIVE-4611:
--

I deleted that test because it is no longer a negative test. I moved it to 
positive tests.

 SMB joins fail based on bigtable selection policy.
 --

 Key: HIVE-4611
 URL: https://issues.apache.org/jira/browse/HIVE-4611
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.11.1

 Attachments: HIVE-4611.2.patch, HIVE-4611.3.patch, HIVE-4611.4.patch, 
 HIVE-4611.5.patch.txt, HIVE-4611.patch


 The default setting for 
 hive.auto.convert.sortmerge.join.bigtable.selection.policy will choose the 
 big table as the one with largest average partition size. However, this can 
 result in a query failing because this policy conflicts with the big table 
 candidates chosen for outer joins. This policy should just be a tie breaker 
 and not have the ultimate say in the choice of tables.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Status: Patch Available  (was: Open)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 12690: HIVE-4870: Explain Extended to show partition info for Fetch Task

2013-08-05 Thread John Pullokkaran


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12690/
---

(Updated Aug. 5, 2013, 6:04 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

Explain extended does not include partition information for Fetch Task 
(FetchWork). Map Reduce Task (MapredWork)already does this.

Patch adds Partition Description info to Fetch Task.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 65c39d6 
  ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out 0e8f96b 
  ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 42e25fa 
  ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 47a8635 
  ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out c39d057 
  ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out bd7381f 
  ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out 6121722 
  ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out e0cd848 
  ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out 924fbad 
  ql/src/test/results/clientpositive/bucketcontext_1.q.out 62910fb 
  ql/src/test/results/clientpositive/bucketcontext_2.q.out 0857c9d 
  ql/src/test/results/clientpositive/bucketcontext_3.q.out 69dc2b2 
  ql/src/test/results/clientpositive/bucketcontext_4.q.out 0d79901 
  ql/src/test/results/clientpositive/bucketcontext_7.q.out 19ea4fa 
  ql/src/test/results/clientpositive/bucketcontext_8.q.out 9a7aaa0 
  ql/src/test/results/clientpositive/bucketmapjoin1.q.out 307132b 
  ql/src/test/results/clientpositive/bucketmapjoin10.q.out 1a6bc06 
  ql/src/test/results/clientpositive/bucketmapjoin11.q.out bd9b1fe 
  ql/src/test/results/clientpositive/bucketmapjoin12.q.out fc161a9 
  ql/src/test/results/clientpositive/bucketmapjoin13.q.out 30d8925 
  ql/src/test/results/clientpositive/bucketmapjoin2.q.out ebbb2ba 
  ql/src/test/results/clientpositive/bucketmapjoin3.q.out 66918b6 
  ql/src/test/results/clientpositive/bucketmapjoin7.q.out 8105ba4 
  ql/src/test/results/clientpositive/bucketmapjoin8.q.out 92c74a9 
  ql/src/test/results/clientpositive/bucketmapjoin9.q.out b7aec66 
  ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out 2d803db 
  ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out 4b8bd14 
  ql/src/test/results/clientpositive/join32.q.out 92d81b9 
  ql/src/test/results/clientpositive/join32_lessSize.q.out 82b3e4a 
  ql/src/test/results/clientpositive/join33.q.out 92d81b9 
  ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out f6aae06 
  ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out dbce51a 
  ql/src/test/results/clientpositive/stats11.q.out 9a5be33 
  ql/src/test/results/clientpositive/union22.q.out bec39f4 

Diff: https://reviews.apache.org/r/12690/diff/


Testing
---

All the hive unit tests passed.


Thanks,

John Pullokkaran

[jira] [Updated] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-05 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4123:
-

 Tags: orc, rle, encoding
Fix Version/s: 0.12.0
   Labels: orcfile  (was: )
Affects Version/s: 0.12.0
   Status: Patch Available  (was: Open)

Making patch available.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
 ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4989) Consolidate and simplify vectorization code and test generation

2013-08-05 Thread Tony Murphy (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729745#comment-13729745
 ] 

Tony Murphy commented on HIVE-4989:
---

This patch should be good to go. HIVE-4971 covers the 
testVectorUDFUnixTimeStampLong failure.


 Consolidate and simplify vectorization code and test generation
 ---

 Key: HIVE-4989
 URL: https://issues.apache.org/jira/browse/HIVE-4989
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4989-vectorization.patch


 The current code generation is unwieldy to use and prone to errors. This 
 change consolidates all the code and test generation into a single location, 
 and removes the need to manually place files which can lead to missing or 
 incomplete code or tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 13274: Consolidate and simplify vectorization code and test generation

2013-08-05 Thread tony murphy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13274/
---

Review request for hive, Eric Hanson, Jitendra Pandey, Remus Rusanu, and 
Sarvesh Sakalanaga.


Bugs: HIVE-4989
https://issues.apache.org/jira/browse/HIVE-4989


Repository: hive-git


Description
---

The current code generation is unwieldy to use and prone to errors. This change 
consolidates all the code and test generation into a single location, and 
removes the need to manually place files which can lead to missing or 
incomplete code or tests.

New usage:
From ql\src\gen\vectorization:
javac org\apache\hadoop\hive\ql\exec\vector\gen*.java
java org.apache.hadoop.hive.ql.exec.vector.gen.CodeGen
Additionally, I've fixed some incomplete\broken test generations.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
 a4c1999 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/ColumnArithmeticColumn.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/ColumnArithmeticScalar.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/ColumnCompareScalar.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/ColumnUnaryMinus.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterColumnCompareColumn.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterColumnCompareScalar.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterScalarCompareColumn.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterStringColumnCompareColumn.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterStringColumnCompareScalar.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/FilterStringScalarCompareColumn.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/ScalarArithmeticColumn.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java
 34c093c 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt
 5b53d6a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFAvg.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFMinMax.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFMinMaxString.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFSum.txt
  
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFVar.txt
  
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java
 dd2cc09 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java
 6baa444 

Diff: https://reviews.apache.org/r/13274/diff/


Testing
---


Thanks,

tony murphy

[jira] [Updated] (HIVE-4995) select * may incorrectly return empty fields with hbase-handler

2013-08-05 Thread Swarnim Kulkarni (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-4995:
---

Attachment: HIVE-4995.1.patch.txt

 select * may incorrectly return empty fields with hbase-handler
 ---

 Key: HIVE-4995
 URL: https://issues.apache.org/jira/browse/HIVE-4995
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.11.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-4995.1.patch.txt, HIVE-4995.1.patch.txt


 HIVE-3725 added capability to pull hbase columns with prefixes. However the 
 way the current logic to add columns stands in HiveHBaseTableInput format, it 
 might cause some columns to incorrectly display empty fields.
 Consider the following query:
 {noformat}
 CREATE EXTERNAL TABLE test_table(key string, value1 mapstring,string, 
 value2 string)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES 
 (hbase.columns.mapping = :key,cf-a:prefix.*,cf-a:another_col) 
 TBLPROPERTIES (hbase.table.name = test_table);
 {noformat}
 Given the existing logic in HiveHBaseTableInputFormat:
 {code}
 for (int i = 0; i  columnsMapping.size(); i++) 
 {
 ColumnMapping colMap = columnsMapping.get(i);
 if (colMap.hbaseRowKey) {
   continue;
 }
 if (colMap.qualifierName == null) {
   scan.addFamily(colMap.familyNameBytes);
 } else {
   scan.addColumn(colMap.familyNameBytes, colMap.qualifierNameBytes);
 }
 }
 {code}
 So for the above query, the 'addFamily' will be called first followed by 
 'addColumn' for the column family cf-a. This will wipe away whatever we had 
 set with the 'addFamily' call in the previous step resulting in an empty 
 column when queried.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4573) Support alternate table types for HiveServer2


[ 
https://issues.apache.org/jira/browse/HIVE-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729763#comment-13729763
 ] 

Thejas M Nair commented on HIVE-4573:
-

bq. Although it seems that eventually (.13?) you would want the default to be 
CLASSIC 

If we don't set the default to CLASSIC sooner, it would never happen. As time 
goes by, more applications would start relying this behavior.
As [~the6campbells] points out, the CLASSIC behavior is documented to be the 
'normal' behavior. While we should aim for backward compatibility, I am not 
sure if that applies to bugs as well.

The managed vs external table information can certainly be very useful. It 
would be good to get that without changing the server configuration. Should we 
rely on something like 'describe table extended' for that ?

While I don't agree on the default, I don't think perfect should get in way of 
good. This improves things by making the classic behavior possible. We can 
discuss the default in a separate jira.
+1 for the patch.


 Support alternate table types for HiveServer2
 -

 Key: HIVE-4573
 URL: https://issues.apache.org/jira/browse/HIVE-4573
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.10.0
Reporter: Johndee Burks
Assignee: Prasad Mujumdar
Priority: Minor
 Attachments: HIVE-4573.1.patch, HIVE-4573.2.patch


 The getTables jdbc function no longer returns information when using normal 
 JDBC table types like TABLE or VIEW. You must now use a more specific type 
 such as MANAGED_TABLE or VIRTUAL_VIEW. An example application that will fail 
 to return results against 0.10 is below, works without issue in 0.9. In my 
 0.10 test I used HS2. 
 {code}
 import java.sql.SQLException;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.Statement;
 import java.sql.DriverManager;
 import org.apache.hive.jdbc.HiveDriver;
 import java.sql.DatabaseMetaData;
 public class TestGet {
   private static String driverName = org.apache.hive.jdbc.HiveDriver;
   /**
  * @param args
  * @throws SQLException
*/
   public static void main(String[] args) throws SQLException {
   try {
   Class.forName(driverName);
 } catch (ClassNotFoundException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
   System.exit(1);
 }
 Connection con = 
 DriverManager.getConnection(jdbc:hive2://hostname:1/default);
 DatabaseMetaData dbmd = con.getMetaData();
 String[] types = {TABLE};
 ResultSet rs = dbmd.getTables(null, null, %, types);
   while (rs.next()) {
 System.out.println(rs.getString(TABLE_NAME));
   }
 }
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4573) Support alternate table types for HiveServer2

2013-08-05 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729771#comment-13729771
 ] 

Edward Capriolo commented on HIVE-4573:
---

External tables are exceedingly rare and commonly misunderstood. I do not 
understand why a driver would care if a table was external or managed, that is 
just an implementation detail.

 Support alternate table types for HiveServer2
 -

 Key: HIVE-4573
 URL: https://issues.apache.org/jira/browse/HIVE-4573
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.10.0
Reporter: Johndee Burks
Assignee: Prasad Mujumdar
Priority: Minor
 Attachments: HIVE-4573.1.patch, HIVE-4573.2.patch


 The getTables jdbc function no longer returns information when using normal 
 JDBC table types like TABLE or VIEW. You must now use a more specific type 
 such as MANAGED_TABLE or VIRTUAL_VIEW. An example application that will fail 
 to return results against 0.10 is below, works without issue in 0.9. In my 
 0.10 test I used HS2. 
 {code}
 import java.sql.SQLException;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.Statement;
 import java.sql.DriverManager;
 import org.apache.hive.jdbc.HiveDriver;
 import java.sql.DatabaseMetaData;
 public class TestGet {
   private static String driverName = org.apache.hive.jdbc.HiveDriver;
   /**
  * @param args
  * @throws SQLException
*/
   public static void main(String[] args) throws SQLException {
   try {
   Class.forName(driverName);
 } catch (ClassNotFoundException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
   System.exit(1);
 }
 Connection con = 
 DriverManager.getConnection(jdbc:hive2://hostname:1/default);
 DatabaseMetaData dbmd = con.getMetaData();
 String[] types = {TABLE};
 ResultSet rs = dbmd.getTables(null, null, %, types);
   while (rs.next()) {
 System.out.println(rs.getString(TABLE_NAME));
   }
 }
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4995) select * may incorrectly return empty fields with hbase-handler


[ 
https://issues.apache.org/jira/browse/HIVE-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729768#comment-13729768
 ] 

Brock Noland commented on HIVE-4995:


+1

 select * may incorrectly return empty fields with hbase-handler
 ---

 Key: HIVE-4995
 URL: https://issues.apache.org/jira/browse/HIVE-4995
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.11.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-4995.1.patch.txt, HIVE-4995.1.patch.txt


 HIVE-3725 added capability to pull hbase columns with prefixes. However the 
 way the current logic to add columns stands in HiveHBaseTableInput format, it 
 might cause some columns to incorrectly display empty fields.
 Consider the following query:
 {noformat}
 CREATE EXTERNAL TABLE test_table(key string, value1 mapstring,string, 
 value2 string)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES 
 (hbase.columns.mapping = :key,cf-a:prefix.*,cf-a:another_col) 
 TBLPROPERTIES (hbase.table.name = test_table);
 {noformat}
 Given the existing logic in HiveHBaseTableInputFormat:
 {code}
 for (int i = 0; i  columnsMapping.size(); i++) 
 {
 ColumnMapping colMap = columnsMapping.get(i);
 if (colMap.hbaseRowKey) {
   continue;
 }
 if (colMap.qualifierName == null) {
   scan.addFamily(colMap.familyNameBytes);
 } else {
   scan.addColumn(colMap.familyNameBytes, colMap.qualifierNameBytes);
 }
 }
 {code}
 So for the above query, the 'addFamily' will be called first followed by 
 'addColumn' for the column family cf-a. This will wipe away whatever we had 
 set with the 'addFamily' call in the previous step resulting in an empty 
 column when queried.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2


[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729777#comment-13729777
 ] 

Hive QA commented on HIVE-4388:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596151/HIVE-4388.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 2759 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.mapreduce.TestSequenceFileReadWrite.testTextTableWriteReadMR
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/307/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/307/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4343) HS2 with kerberos- local task for map join fails


[ 
https://issues.apache.org/jira/browse/HIVE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729784#comment-13729784
 ] 

Arup Malakar commented on HIVE-4343:


This patch causes compilation failure when compiled against hadoop 20.
I tried _ant clean package  -Dhadoop.mr.rev=20_

{code}
 [echo] Project: ql
[javac] Compiling 904 source files to 
/Users/malakar/code/oss/hive/build/ql/classes
[javac] 
/Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/SecureCmdDoAs.java:51:
 cannot find symbol
[javac] symbol  : variable HADOOP_TOKEN_FILE_LOCATION
[javac] location: class org.apache.hadoop.security.UserGroupInformation
[javac] env.put(UserGroupInformation.HADOOP_TOKEN_FILE_LOCATION,
[javac] ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 error
{code}


 HS2 with kerberos- local task for map join fails
 

 Key: HIVE-4343
 URL: https://issues.apache.org/jira/browse/HIVE-4343
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4343.1.patch, HIVE-4343.2.patch, HIVE-4343.3.patch


 With hive server2 configured with kerberos security, when a (map) join query 
 is run, it results in failure with GSSException: No valid credentials 
 provided 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4992) add ability to skip javadoc during build


 [ 
https://issues.apache.org/jira/browse/HIVE-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4992:
--

Attachment: HIVE-4992.D11967.2.patch

sershe updated the revision HIVE-4992 [jira] add ability to skip javadoc 
during build.

  Also change hcatalog/build.xml

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11967

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11967?vs=36885id=36993#toc

AFFECTED FILES
  build.xml
  hcatalog/build.xml

To: JIRA, sershe


 add ability to skip javadoc during build
 

 Key: HIVE-4992
 URL: https://issues.apache.org/jira/browse/HIVE-4992
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HIVE-4992.D11967.1.patch, HIVE-4992.D11967.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4343) HS2 with kerberos- local task for map join fails


[ 
https://issues.apache.org/jira/browse/HIVE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729800#comment-13729800
 ] 

Thejas M Nair commented on HIVE-4343:
-

[~amalakar] HIVE-4991 has fix for the 0.20 build issue caused by this patch.


 HS2 with kerberos- local task for map join fails
 

 Key: HIVE-4343
 URL: https://issues.apache.org/jira/browse/HIVE-4343
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4343.1.patch, HIVE-4343.2.patch, HIVE-4343.3.patch


 With hive server2 configured with kerberos security, when a (map) join query 
 is run, it results in failure with GSSException: No valid credentials 
 provided 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4343) HS2 with kerberos- local task for map join fails


[ 
https://issues.apache.org/jira/browse/HIVE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729805#comment-13729805
 ] 

Arup Malakar commented on HIVE-4343:


[~thejas]Thanks for  the update. Saw the patch in HIVE-4991.

 HS2 with kerberos- local task for map join fails
 

 Key: HIVE-4343
 URL: https://issues.apache.org/jira/browse/HIVE-4343
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4343.1.patch, HIVE-4343.2.patch, HIVE-4343.3.patch


 With hive server2 configured with kerberos security, when a (map) join query 
 is run, it results in failure with GSSException: No valid credentials 
 provided 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4991) hive build with 0.20 is broken


[ 
https://issues.apache.org/jira/browse/HIVE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729808#comment-13729808
 ] 

Arup Malakar commented on HIVE-4991:


I see the following error complaining about an import in HiveSessionImpl.java I 
see that HiveSessionImpl doesn't use the import. Removing the import fixed the 
problem for me.

{code}
 [echo] Project: service
[javac] Compiling 144 source files to 
/Users/malakar/code/oss/hive/build/service/classes
[javac] 
/Users/malakar/code/oss/hive/service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java:29:
 package org.apache.commons.io does not exist
[javac] import org.apache.commons.io.FileUtils;
[javac] ^
[javac] Note: 
/Users/malakar/code/oss/hive/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 error
{code}

 hive build with 0.20 is broken
 --

 Key: HIVE-4991
 URL: https://issues.apache.org/jira/browse/HIVE-4991
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Edward Capriolo
Priority: Blocker
  Labels: newbie
 Attachments: HIVE-4991.2.patch.txt, HIVE-4991.patch.txt


 As reported in HIVE-4911 
 ant clean package -Dhadoop.mr.rev=20
 Fails with - 
 {code}
 compile:
  [echo] Project: ql
 [javac] Compiling 898 source files to 
 /Users/malakar/code/oss/hive/build/ql/classes
 [javac] 
 /Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:35:
  package org.apache.commons.io does not exist
 [javac] import org.apache.commons.io.FileUtils;
 [javac] ^
 [javac] 
 /Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:743:
  cannot find symbol
 [javac] symbol  : variable FileUtils
 [javac] location: class org.apache.hadoop.hive.ql.session.SessionState
 [javac] FileUtils.deleteDirectory(resourceDir);
 [javac] ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 12824: [HIVE-4911] Enable QOP configuration for Hive Server 2 thrift transport

2013-08-05 Thread Arup Malakar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12824/
---

(Updated Aug. 5, 2013, 6:54 p.m.)


Review request for hive.


Changes
---

Rebased.


Bugs: HIVE-4911
https://issues.apache.org/jira/browse/HIVE-4911


Repository: hive-git


Description
---

The QoP for hive server 2 should be configurable to enable encryption. A new 
configuration should be exposed hive.server2.thrift.rpc.protection. This 
would give greater control configuring hive server 2 service.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
555343ebffb9dcd5e58d5b99ce9ca52904f68ecf 
  conf/hive-default.xml.template f01e715e4de95b4011210143f7d3add2d8a4d432 
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 
00f43511b478c687b7811fc8ad66af2b507a3626 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
cde58c25991641573453217da71a7ac1acf6adfd 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
cef50f40ccb047a8135f704b2997968a2cf477b8 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
88151a1d48b12cf3a8346ae94b6d1a182a331992 
  service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java 
1809e1b26ceee5de14a354a0e499aa8c0ab793bf 
  service/src/java/org/apache/hive/service/auth/KerberosSaslHelper.java 
379dafb8377aed55e74f0ae18407996bb9e1216f 
  service/src/java/org/apache/hive/service/auth/SaslQOP.java PRE-CREATION 
  
shims/src/common-secure/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java
 1df6993cb9aac1bb195667b3123faee27d657c0a 
  
shims/src/common-secure/test/org/apache/hadoop/hive/thrift/TestHadoop20SAuthBridge.java
 3e850ec3991cbb2d4343969ba8fe9df4a7d137b5 
  
shims/src/common/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge.java 
ab7f5c0eb5345e68e3f223c9dfed8414de946661 

Diff: https://reviews.apache.org/r/12824/diff/


Testing
---


Thanks,

Arup Malakar

[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved


[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729816#comment-13729816
 ] 

Hive QA commented on HIVE-4123:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594072/HIVE-4123.4.patch.txt

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/308/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/308/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-308/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ant/src/org/apache/hadoop/hive/ant/antlib.xml'
Reverted 'hbase-handler/ivy.xml'
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java'
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java'
Reverted 
'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java'
Reverted 
'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java'
Reverted 
'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java'
Reverted 'build.xml'
Reverted 'ivy/libraries.properties'
Reverted 'hcatalog/core/build.xml'
Reverted 'hcatalog/pom.xml'
Reverted 'hcatalog/build.properties'
Reverted 'hcatalog/build.xml'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/snapshot/TestRevisionManager.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/snapshot/TestRevisionManagerEndpoint.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/ManyMiniCluster.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/TestHBaseDirectOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/TestHBaseBulkOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/TestHBaseInputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/TableSnapshot.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerProtocol.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/Transaction.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManager.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerEndpointClient.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerEndpoint.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/ZKBasedRevisionManager.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/ImportSequenceFile.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HbaseSnapshotRecordReader.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseHCatStorageHandler.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseDirectOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBulkOutputFormat.java'
Reverted 
'hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseInputFormat.java'
Reverted 'hcatalog/storage-handlers/hbase/pom.xml'
Reverted 'hcatalog/build-support/ant/build-common.xml'
Reverted 'hcatalog/build-support/ant/deploy.xml'
Reverted 'hcatalog/build-support/ant/checkstyle.xml'
Reverted 
'hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hcatalog/pig/TestE2EScenarios.java'
Reverted 'build-common.xml'
Reverted '.gitignore'
Reverted 'ql/ivy.xml'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf build ant/src/org/apache/hadoop/hive/ant/SetSystemProperty.java 
hbase-handler/src/java/org/apache/hadoop/hive/hbase/PutWritable.java

[jira] [Updated] (HIVE-4911) Enable QOP configuration for Hive Server 2 thrift transport


 [ 
https://issues.apache.org/jira/browse/HIVE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated HIVE-4911:
---

Attachment: 20-build-temp-change-1.patch
HIVE-4911-trunk-3.patch

I used 20-build-temp-change-1.patch to compile against 20. 

[~thejas] Let me know if you have any comments.

 Enable QOP configuration for Hive Server 2 thrift transport
 ---

 Key: HIVE-4911
 URL: https://issues.apache.org/jira/browse/HIVE-4911
 Project: Hive
  Issue Type: New Feature
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: 20-build-temp-change-1.patch, 
 20-build-temp-change.patch, HIVE-4911-trunk-0.patch, HIVE-4911-trunk-1.patch, 
 HIVE-4911-trunk-2.patch, HIVE-4911-trunk-3.patch


 The QoP for hive server 2 should be configurable to enable encryption. A new 
 configuration should be exposed hive.server2.thrift.rpc.protection. This 
 would give greater control configuring hive server 2 service.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4998) support jdbc documented table types in default configuration

Thejas M Nair created HIVE-4998:
---

 Summary: support jdbc documented table types in default 
configuration
 Key: HIVE-4998
 URL: https://issues.apache.org/jira/browse/HIVE-4998
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Thejas M Nair


The jdbc table types supported by hive server2 are not the documented typical 
types [1] in jdbc, they are hive specific types (MANAGED_TABLE, EXTERNAL_TABLE, 
VIRTUAL_VIEW). 

HIVE-4573 added support for the jdbc documented typical types, but the HS2 
default configuration is to return the hive types 

The default configuration should result in the expected jdbc typical behavior.

[1] 
http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html?is-external=true#getTables(java.lang.String,
 java.lang.String, java.lang.String, java.lang.String[])


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4999) Shim class HiveHarFileSystem does not have a hadoop2 counterpart

Brock Noland created HIVE-4999:
--

 Summary: Shim class HiveHarFileSystem does not have a hadoop2 
counterpart
 Key: HIVE-4999
 URL: https://issues.apache.org/jira/browse/HIVE-4999
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


HiveHarFileSystem only exists in the 0.20 shim.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4988) HCatalog Pig Adapter test does not compile under hadoop2


 [ 
https://issues.apache.org/jira/browse/HIVE-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-4988.


Resolution: Duplicate

 HCatalog Pig Adapter test does not compile under hadoop2
 

 Key: HIVE-4988
 URL: https://issues.apache.org/jira/browse/HIVE-4988
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland

 {noformat}
 compile-test:
  [echo] hcatalog-pig-adapter
 [mkdir] Created dir: 
 /home/brock/workspaces/hive-apache/hive/hcatalog/hcatalog-pig-adapter/build/test/classes
 [javac] Compiling 14 source files to 
 /home/brock/workspaces/hive-apache/hive/hcatalog/hcatalog-pig-adapter/build/test/classes
 [javac] 
 /home/brock/workspaces/hive-apache/hive/hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hcatalog/pig/TestE2EScenarios.java:196:
  org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be 
 instantiated
 [javac] TaskAttemptContext rtaskContext = new 
 TaskAttemptContext(conf , taskId );
 [javac]   ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] 1 error
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5000) hive.optimize.skewjoin can cause long running queries to fail

Brock Noland created HIVE-5000:
--

 Summary: hive.optimize.skewjoin can cause long running queries to 
fail
 Key: HIVE-5000
 URL: https://issues.apache.org/jira/browse/HIVE-5000
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Priority: Minor


{noformat}
MapReduce Total cumulative CPU time: 5 days 19 hours 7 minutes 8 seconds 540 
msec
Ended Job = job_201301311513_15328
java.io.FileNotFoundException: File 
hdfs://:8020/tmp/hive-scripts/hive_2013-02-06_10-23-17_026_1520760778337129611/
-mr-10002/hive_skew_join_bigkeys_0 does not exist.
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:406)
at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96)
at 
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:439)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:449)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:700)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Ended Job = -1079843427, job is filtered out (removed at runtime).
8390065 Rows loaded to 
hdfs:///tmp/hive-scripts/hive_2013-02-06_10-23-17_026_1520760778337129611/-ext-1
MapReduce Jobs Launched:
Job 0: Map: 970 Reduce: 260 Cumulative CPU: 500828.54 sec HDFS Read: 0 HDFS 
Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 5 days 19 hours 7 minutes 8 seconds 540 msec
OK
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Attachment: (was: HIVE-4997.patch1)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Status: Open  (was: Patch Available)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Attachment: HIVE-4997.1.patch

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.1.patch, HIVE-4997.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Status: Patch Available  (was: Open)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.1.patch, HIVE-4997.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4573) Support alternate table types for HiveServer2

2013-08-05 Thread Prasad Mujumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729883#comment-13729883
 ] 

Prasad Mujumdar commented on HIVE-4573:
---

[~thejas] Thanks for setting up separate ticket for the default behavior. It's 
a good idea to decouple that discussion from the base implementation.

 Support alternate table types for HiveServer2
 -

 Key: HIVE-4573
 URL: https://issues.apache.org/jira/browse/HIVE-4573
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.10.0
Reporter: Johndee Burks
Assignee: Prasad Mujumdar
Priority: Minor
 Attachments: HIVE-4573.1.patch, HIVE-4573.2.patch


 The getTables jdbc function no longer returns information when using normal 
 JDBC table types like TABLE or VIEW. You must now use a more specific type 
 such as MANAGED_TABLE or VIRTUAL_VIEW. An example application that will fail 
 to return results against 0.10 is below, works without issue in 0.9. In my 
 0.10 test I used HS2. 
 {code}
 import java.sql.SQLException;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.Statement;
 import java.sql.DriverManager;
 import org.apache.hive.jdbc.HiveDriver;
 import java.sql.DatabaseMetaData;
 public class TestGet {
   private static String driverName = org.apache.hive.jdbc.HiveDriver;
   /**
  * @param args
  * @throws SQLException
*/
   public static void main(String[] args) throws SQLException {
   try {
   Class.forName(driverName);
 } catch (ClassNotFoundException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
   System.exit(1);
 }
 Connection con = 
 DriverManager.getConnection(jdbc:hive2://hostname:1/default);
 DatabaseMetaData dbmd = con.getMetaData();
 String[] types = {TABLE};
 ResultSet rs = dbmd.getTables(null, null, %, types);
   while (rs.next()) {
 System.out.println(rs.getString(TABLE_NAME));
   }
 }
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Attachment: (was: HIVE-4997.patch)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.1.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Attachment: (was: HIVE-4997.1.patch)

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.1.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task


[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729885#comment-13729885
 ] 

Hive QA commented on HIVE-4870:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596161/HIVE-4870.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2759 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.pig.TestOrcHCatLoaderComplexSchema.testTupleInBagInTupleInBag
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/309/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/309/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables


 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Intskirveli updated HIVE-4997:
-

Attachment: HIVE-4997.1.patch

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.1.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5001) [WebHCat] JobState is read/written with different user credentials

2013-08-05 Thread Eugene Koifman (JIRA)

Eugene Koifman created HIVE-5001:


 Summary: [WebHCat] JobState is read/written with different user 
credentials
 Key: HIVE-5001
 URL: https://issues.apache.org/jira/browse/HIVE-5001
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Eugene Koifman


JobState can be persisted to HDFS or Zookeeper.  At various points in the 
lifecycle it's accessed with different user credentials thus may cause errors 
depending on how permissions are set.

Example:
When submitting a MR job, templeton.JarDelegator is used.
It calls LauncherDelegator#queueAsUser() which runs TempletonControllerJob with 
UserGroupInformation.doAs().

TempletonControllerJob will in turn create JobState and persist it.

LauncherDelegator.registerJob() also modifies JobState but w/o doing a doAs()
So in the later case it's possible that the persisted state of JobState by a 
different user than one that created/owns the file.

templeton.tool.HDFSCleanup tries to delete these files w/o doAs.

'childid' file, for example, is created with rw-r--r--.
and it's parent directory (job_201308051224_0001) has rwxr-xr-x.

HDFSStorage doesn't set file permissions explicitly so it must be using default 
permissions.

So there is a potential issue here (depending on UMASK) especially once 
HIVE-4601 is addressed.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5001) [WebHCat] JobState is read/written with different user credentials

2013-08-05 Thread Eugene Koifman (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman updated HIVE-5001:
-

Description:
JobState can be persisted to HDFS or Zookeeper. At various points in the
lifecycle it's accessed with different user credentials thus may cause errors
depending on how permissions are set.

Example:
When submitting a MR job, templeton.JarDelegator is used.
It calls LauncherDelegator#queueAsUser() which runs TempletonControllerJob with
UserGroupInformation.doAs().

TempletonControllerJob will in turn create JobState and persist it.

LauncherDelegator.registerJob() also modifies JobState but w/o doing a doAs()
So in the later case it's possible that the persisted state of JobState by a
different user than one that created/owns the file.

templeton.tool.HDFSCleanup tries to delete these files w/o doAs.

'childid' file, for example, is created with rw-r--r--.
and it's parent directory (job_201308051224_0001) has rwxr-xr-x.

HDFSStorage doesn't set file permissions explicitly so it must be using default
permissions.

So there is a potential issue here (depending on UMASK) especially once
HIVE-4601 is addressed.
Actually, even w/o HIVE-4601 the user that owns the WebHCat process is likely
different than the one submitting a request.

was:
JobState can be persisted to HDFS or Zookeeper. At various points in the
lifecycle it's accessed with different user credentials thus may cause errors
depending on how permissions are set.

Example:
When submitting a MR job, templeton.JarDelegator is used.
It calls LauncherDelegator#queueAsUser() which runs TempletonControllerJob with
UserGroupInformation.doAs().

TempletonControllerJob will in turn create JobState and persist it.

templeton.tool.HDFSCleanup tries to delete these files w/o doAs.

'childid' file, for example, is created with rw-r--r--.
and it's parent directory (job_201308051224_0001) has rwxr-xr-x.

HDFSStorage doesn't set file permissions explicitly so it must be using default
permissions.

So there is a potential issue here (depending on UMASK) especially once
HIVE-4601 is addressed.

[WebHCat] JobState is read/written with different user credentials
--

Key: HIVE-5001
URL: https://issues.apache.org/jira/browse/HIVE-5001
Project: Hive
Issue Type: Bug
Components: HCatalog
Affects Versions: 0.11.0
Reporter: Eugene Koifman

JobState can be persisted to HDFS or Zookeeper. At various points in the
lifecycle it's accessed with different user credentials thus may cause errors
depending on how permissions are set.
Example:
When submitting a MR job, templeton.JarDelegator is used.
It calls LauncherDelegator#queueAsUser() which runs TempletonControllerJob
with UserGroupInformation.doAs().
TempletonControllerJob will in turn create JobState and persist it.
LauncherDelegator.registerJob() also modifies JobState but w/o doing a doAs()
So in the later case it's possible that the persisted state of JobState by a
different user than one that created/owns the file.
templeton.tool.HDFSCleanup tries to delete these files w/o doAs.
'childid' file, for example, is created with rw-r--r--.
and it's parent directory (job_201308051224_0001) has rwxr-xr-x.
HDFSStorage doesn't set file permissions explicitly so it must be using
default permissions.
So there is a potential issue here (depending on UMASK) especially once
HIVE-4601 is addressed.
Actually, even w/o HIVE-4601 the user that owns the WebHCat process is likely
different than the one submitting a request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4573) Support alternate table types for HiveServer2


[ 
https://issues.apache.org/jira/browse/HIVE-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729892#comment-13729892
 ] 

Gunther Hagleitner commented on HIVE-4573:
--

[~appodictic], [~thejas], [~prasadm] - If I understand this correctly this 
patch is needed as is and I am planning to commit in a few hours. The 
discussion on default, deprecation etc is happening on HIVE-4998. Speak up in 
the next couple of hours if you disagree.

 Support alternate table types for HiveServer2
 -

 Key: HIVE-4573
 URL: https://issues.apache.org/jira/browse/HIVE-4573
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.10.0
Reporter: Johndee Burks
Assignee: Prasad Mujumdar
Priority: Minor
 Attachments: HIVE-4573.1.patch, HIVE-4573.2.patch


 The getTables jdbc function no longer returns information when using normal 
 JDBC table types like TABLE or VIEW. You must now use a more specific type 
 such as MANAGED_TABLE or VIRTUAL_VIEW. An example application that will fail 
 to return results against 0.10 is below, works without issue in 0.9. In my 
 0.10 test I used HS2. 
 {code}
 import java.sql.SQLException;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.Statement;
 import java.sql.DriverManager;
 import org.apache.hive.jdbc.HiveDriver;
 import java.sql.DatabaseMetaData;
 public class TestGet {
   private static String driverName = org.apache.hive.jdbc.HiveDriver;
   /**
  * @param args
  * @throws SQLException
*/
   public static void main(String[] args) throws SQLException {
   try {
   Class.forName(driverName);
 } catch (ClassNotFoundException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
   System.exit(1);
 }
 Connection con = 
 DriverManager.getConnection(jdbc:hive2://hostname:1/default);
 DatabaseMetaData dbmd = con.getMetaData();
 String[] types = {TABLE};
 ResultSet rs = dbmd.getTables(null, null, %, types);
   while (rs.next()) {
 System.out.println(rs.getString(TABLE_NAME));
   }
 }
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow


[ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729915#comment-13729915
 ] 

Phabricator commented on HIVE-4051:
---

ashutoshc has requested changes to the revision HIVE-4051 [jira] Hive's 
metastore suffers from 1+N queries when querying partitions  is slow.

  Mostly looks good. Can you update the final patch with new class in its own 
file, with following two comments (if they looks alright.)

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1681 You 
are still selecting dbname, tblname ?
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1776 
Yes.. I think we should throw in those cases. Having empty list will mask the 
root problem if there is any which results from it.

REVISION DETAIL
  https://reviews.facebook.net/D11805

BRANCH
  HIVE-4051

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, sershe
Cc: brock


 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
 HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch, 
 HIVE-4051.D11805.6.patch, HIVE-4051.D11805.7.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the partitions, even in the same hive client.
 The queries are automatically generated by JDO/DataNucleus which makes it 
 nearly impossible to rewrite it into a single denormalized join operation  
 process it locally.
 Attempts to optimize this with JDO fetch-groups did not bear fruit in 
 improving the query count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2013-08-05 Thread Yu Gao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729921#comment-13729921
 ] 

Yu Gao commented on HIVE-2935:
--

Maybe I missed the discussion here, but seems to me that HiveServer2 can be 
configured with either SASL GSS (Kerberos) or SASL PLAIN (LDAP, CUSTOM 
username/password authentication), but not both simultaneously. Can I ask the 
reason for this, and whether it is straightforward to enable PLAIN and GSS 
simultaneously in the future? This is very useful for applications that have 
been supporting LDAP authentication on Hive, and when turn to Kerberos, legacy 
clients or non-kerberos clients would still be able to access kerberized 
HiveServer2. 

Thanks!

 Implement HiveServer2
 -

 Key: HIVE-2935
 URL: https://issues.apache.org/jira/browse/HIVE-2935
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2, Server Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
  Labels: HiveServer2
 Fix For: 0.11.0

 Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
 HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, 
 HIVE-2935.3.patch.gz, HIVE-2935-4.changed-files-only.patch, 
 HIVE-2935-4.nothrift.patch, HIVE-2935-4.patch, HIVE-2935-5.beeline.patch, 
 HIVE-2935-5.core-hs2.patch, HIVE-2935-5.thrift-gen.patch, 
 HIVE-2935-7.patch.tar.gz, HIVE-2935-7.testerrs.patch, 
 HIVE-2935.fix.unsecuredoAs.patch, HS2-changed-files-only.patch, 
 HS2-with-thrift-patch-rebased.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4990) ORC seeks fails with non-zero offset or column projection


 [ 
https://issues.apache.org/jira/browse/HIVE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4990:


Status: Patch Available  (was: Open)

 ORC seeks fails with non-zero offset or column projection
 -

 Key: HIVE-4990
 URL: https://issues.apache.org/jira/browse/HIVE-4990
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.1

 Attachments: HIVE-4990.D12009.1.patch


 The ORC reader gets exceptions when seeking with non-zero offsets or column 
 projection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization

2013-08-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4794:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Tony!

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, HIVE-4794.2.patch, HIVE-4794.3.patch, 
 HIVE-4794.3-vectorization.patch, HIVE-4794.4-vectorization.patch, 
 HIVE-4794.5-vectorization.patch, hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4990) ORC seeks fails with non-zero offset or column projection


[ 
https://issues.apache.org/jira/browse/HIVE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729945#comment-13729945
 ] 

Owen O'Malley commented on HIVE-4990:
-

The fix protects against null in seek and skiprows and subtracts off the 
missing firstRow for readers that are only reading a part of the file.

 ORC seeks fails with non-zero offset or column projection
 -

 Key: HIVE-4990
 URL: https://issues.apache.org/jira/browse/HIVE-4990
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.1

 Attachments: HIVE-4990.D12009.1.patch


 The ORC reader gets exceptions when seeking with non-zero offsets or column 
 projection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4971) Unit test failure in TestVectorTimestampExpressions

2013-08-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4971:
---

   Resolution: Fixed
Fix Version/s: vectorization-branch
   Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Gopal!

 Unit test failure in TestVectorTimestampExpressions
 ---

 Key: HIVE-4971
 URL: https://issues.apache.org/jira/browse/HIVE-4971
 Project: Hive
  Issue Type: Sub-task
  Components: Tests, UDF
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Gopal V
 Fix For: vectorization-branch

 Attachments: HIVE-4971.patch, HIVE-4971-vectorization.patch


 Unit test testVectorUDFUnixTimeStampLong is failing 
 TestVectorTimestampExpressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4990) ORC seeks fails with non-zero offset or column projection


 [ 
https://issues.apache.org/jira/browse/HIVE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4990:
--

Attachment: HIVE-4990.D12009.1.patch

omalley requested code review of HIVE-4990 [jira] ORC seeks fails with 
non-zero offset or column projection.

Reviewers: JIRA

HIVE-4990

The ORC reader gets exceptions when seeking with non-zero offsets or column 
projection.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12009

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/28683/

To: JIRA, omalley


 ORC seeks fails with non-zero offset or column projection
 -

 Key: HIVE-4990
 URL: https://issues.apache.org/jira/browse/HIVE-4990
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.1

 Attachments: HIVE-4990.D12009.1.patch


 The ORC reader gets exceptions when seeking with non-zero offsets or column 
 projection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4995) select * may incorrectly return empty fields with hbase-handler


[ 
https://issues.apache.org/jira/browse/HIVE-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729950#comment-13729950
 ] 

Hive QA commented on HIVE-4995:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596171/HIVE-4995.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2760 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/310/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/310/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 select * may incorrectly return empty fields with hbase-handler
 ---

 Key: HIVE-4995
 URL: https://issues.apache.org/jira/browse/HIVE-4995
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.11.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-4995.1.patch.txt, HIVE-4995.1.patch.txt


 HIVE-3725 added capability to pull hbase columns with prefixes. However the 
 way the current logic to add columns stands in HiveHBaseTableInput format, it 
 might cause some columns to incorrectly display empty fields.
 Consider the following query:
 {noformat}
 CREATE EXTERNAL TABLE test_table(key string, value1 mapstring,string, 
 value2 string)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES 
 (hbase.columns.mapping = :key,cf-a:prefix.*,cf-a:another_col) 
 TBLPROPERTIES (hbase.table.name = test_table);
 {noformat}
 Given the existing logic in HiveHBaseTableInputFormat:
 {code}
 for (int i = 0; i  columnsMapping.size(); i++) 
 {
 ColumnMapping colMap = columnsMapping.get(i);
 if (colMap.hbaseRowKey) {
   continue;
 }
 if (colMap.qualifierName == null) {
   scan.addFamily(colMap.familyNameBytes);
 } else {
   scan.addColumn(colMap.familyNameBytes, colMap.qualifierNameBytes);
 }
 }
 {code}
 So for the above query, the 'addFamily' will be called first followed by 
 'addColumn' for the column family cf-a. This will wipe away whatever we had 
 set with the 'addFamily' call in the previous step resulting in an empty 
 column when queried.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4967) Don't serialize unnecessary fields in query plan

2013-08-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729998#comment-13729998
 ] 

Ashutosh Chauhan commented on HIVE-4967:


Ping [~brocknoland]  : )

 Don't serialize unnecessary fields in query plan
 

 Key: HIVE-4967
 URL: https://issues.apache.org/jira/browse/HIVE-4967
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-4967.patch


 There are quite a few fields which need not to be serialized since they are 
 initialized anyways in backend. We need not to serialize them in our plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4967) Don't serialize unnecessary fields in query plan


[ 
https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730004#comment-13730004
 ] 

Brock Noland commented on HIVE-4967:


Hey thanks for the ping! I will review right now.

 Don't serialize unnecessary fields in query plan
 

 Key: HIVE-4967
 URL: https://issues.apache.org/jira/browse/HIVE-4967
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-4967.patch


 There are quite a few fields which need not to be serialized since they are 
 initialized anyways in backend. We need not to serialize them in our plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


 [ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4964:
--

Attachment: HIVE-4964.D11985.2.patch

hbutani updated the revision HIVE-4964 [jira] Cleanup PTF code: remove code 
dealing with non standard sql behavior we had original introduced.

- fix formatting issues
- fix lint issues

Reviewers: JIRA, ashutoshc

REVISION DETAIL
  https://reviews.facebook.net/D11985

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11985?vs=36957id=37053#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java

To: JIRA, ashutoshc, hbutani


 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-05 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730044#comment-13730044
 ] 

Harish Butani commented on HIVE-4964:
-

Yes the formatting issues are long overdue. These are carryover from when we 
initially wrote a lot of this code; was using a different set of formatting 
rules. Cannot get eclipse to auto fix based on Hive's rules; so manually fixing 
them(as much as possible). So please bear with us...

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2


[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730038#comment-13730038
 ] 

Hive QA commented on HIVE-4388:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596151/HIVE-4388.patch

{color:green}SUCCESS:{color} +1 2759 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/311/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/311/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Inconsistent results with and without index. Is this a bug?

2013-08-05 Thread Xing Wu

Hive Dev Team,


Greetings!

We have encountered some issue when using Hive 0.8.1.8 and Hive 0.11.0. After 
some investigation, we think this looks like a bug in Hive. I'm therefore 
sending this email to report this issue and to confirm with you. Please let me 
know if this is not the correct mailing list for this kind of topic. 

The issue we had is related to indexed queries on external tables stored as 
sequence file. For example, if we have a simple table like the one created 
below, 

CREATE TABLE hive_test
(
id int,
name string,
info string
)
STORED AS SEQUENCEFILE; 

We first insert 5000 rows with the same id (e.g., id = 1) into this table. We 
then count the total number of rows in this table by running the query below 
and get the correct result 5000.

select count(*) from hive_test where id = 1;

After this, we create an index on id,

CREATE INDEX test_index ON TABLE hive_test(id) AS 
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
REBUILD;
ALTER INDEX test_index ON hive_test REBUILD;
set hive.optimize.index.filter=true;
set hive.optimize.index.filter.compact.minsize=0;

Then, we run the same query 'select count(*) from hive_test where id = 1;' 
again but get a different result (count  5000). 

We tried to dig into the Hive source code and found the following piece of code 
in HiveIndexedInputFormat.java which might be the root cause of the duplicated 
rows,

if (split.inputFormatClassName().contains(RCFile) || 
split.inputFormatClassName().contains(SequenceFile)) {
if (split.getStart()  SequenceFile.SYNC_INTERVAL) {
newSplit = new HiveInputSplit(new FileSplit(split.getPath(),
split.getStart() - SequenceFile.SYNC_INTERVAL,
split.getLength() + SequenceFile.SYNC_INTERVAL,
split.getLocations()),
split.inputFormatClassName());
}
}

According to my understanding on SequenceFile and SequenceFileRecordReader, I 
think it's unnecessary and incorrect to add the extra 2000 bytes to the 
beginning of each input split because it actually causes some of the rows in 
the overlapping regions to be processed by two mappers. Please correct me if 
I'm wrong. 


Thank you,
Xing

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-05 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730058#comment-13730058
 ] 

Edward Capriolo commented on HIVE-4964:
---

With eclipse I have had luck using this as my code-style settings. typically 
you can highlight some code or the entire file, right click and then ask 
eclipse to reformat.
https://github.com/zznate/intravert-ug/blob/master/src/main/resources/eclipseUima_code_style_prefs.xml

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: HIVE-4870.patch

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: (was: HIVE-4870.patch)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4683) fix coverage org.apache.hadoop.hive.cli


[ 
https://issues.apache.org/jira/browse/HIVE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730101#comment-13730101
 ] 

Hudson commented on HIVE-4683:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2246 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2246/])
HIVE-4683 : fix coverage org.apache.hadoop.hive.cli (Aleksey Gorshkov via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510346)
* /hive/trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java
* /hive/trunk/cli/src/java/org/apache/hadoop/hive/cli/RCFileCat.java
* /hive/trunk/cli/src/test/org/apache/hadoop/hive/cli/TestCliDriverMethods.java
* /hive/trunk/cli/src/test/org/apache/hadoop/hive/cli/TestCliSessionState.java
* /hive/trunk/cli/src/test/org/apache/hadoop/hive/cli/TestOptionsProcessor.java
* /hive/trunk/cli/src/test/org/apache/hadoop/hive/cli/TestRCFileCat.java


 fix coverage org.apache.hadoop.hive.cli
 ---

 Key: HIVE-4683
 URL: https://issues.apache.org/jira/browse/HIVE-4683
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 0.12.0

 Attachments: HIVE-4683-branch-0.10.patch, 
 HIVE-4683-branch-0.10-v1.patch, HIVE-4683-branch-0.11-v1.patch, 
 HIVE-4683-trunk.patch, HIVE-4683-trunk-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables


[ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730111#comment-13730111
 ] 

Hive QA commented on HIVE-4997:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596206/HIVE-4997.1.patch

{color:green}SUCCESS:{color} +1 2760 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/312/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/312/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 HCatalog doesn't allow multiple input tables
 

 Key: HIVE-4997
 URL: https://issues.apache.org/jira/browse/HIVE-4997
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Daniel Intskirveli
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4997.1.patch


 HCatInputFormat does not allow reading from multiple hive tables in the same 
 MapReduce job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4826) Setup build infrastructure for tez


 [ 
https://issues.apache.org/jira/browse/HIVE-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4826:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to tez branch.

 Setup build infrastructure for tez
 --

 Key: HIVE-4826
 URL: https://issues.apache.org/jira/browse/HIVE-4826
 Project: Hive
  Issue Type: New Feature
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-4826.2.patch, HIVE-4826.patch


 Address changes required in ivy and build xml files to support tez.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-5001) [WebHCat] JobState is read/written with different user credentials

2013-08-05 Thread Eugene Koifman (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman reassigned HIVE-5001:

Assignee: Eugene Koifman

[WebHCat] JobState is read/written with different user credentials
--

Key: HIVE-5001
URL: https://issues.apache.org/jira/browse/HIVE-5001
Project: Hive
Issue Type: Bug
Components: HCatalog
Affects Versions: 0.11.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

[jira] [Created] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private

Owen O'Malley created HIVE-5002:
---

 Summary: Loosen readRowIndex visibility in ORC's RecordReaderImpl 
to package private
 Key: HIVE-5002
 URL: https://issues.apache.org/jira/browse/HIVE-5002
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Some users want to be able to access the rowIndexes directly from ORC reader 
extensions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private


 [ 
https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-5002:


  Component/s: File Formats
Affects Version/s: 0.12.0
Fix Version/s: 0.12.0

 Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
 ---

 Key: HIVE-5002
 URL: https://issues.apache.org/jira/browse/HIVE-5002
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0


 Some users want to be able to access the rowIndexes directly from ORC reader 
 extensions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private


 [ 
https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5002:
--

Attachment: HIVE-5002.D12015.1.patch

omalley requested code review of HIVE-5002 [jira] Loosen readRowIndex 
visibility in ORC's RecordReaderImpl to package private.

Reviewers: JIRA

HIVE-5002

Some users want to be able to access the rowIndexes directly from ORC reader 
extensions.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12015

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/28713/

To: JIRA, omalley


 Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
 ---

 Key: HIVE-5002
 URL: https://issues.apache.org/jira/browse/HIVE-5002
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-5002.D12015.1.patch


 Some users want to be able to access the rowIndexes directly from ORC reader 
 extensions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow


 [ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4051:
--

Attachment: HIVE-4051.D11805.8.patch

sershe updated the revision HIVE-4051 [jira] Hive's metastore suffers from 1+N 
queries when querying partitions  is slow.

  Moved the code for SQL filter generation and usage into separate class.
  The only other changes are latest two comments on Phabricator, as well as 
some minor cleanup like null checks.

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11805

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11805?vs=36879id=37071#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java

To: JIRA, ashutoshc, sershe
Cc: brock


 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
 HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch, 
 HIVE-4051.D11805.6.patch, HIVE-4051.D11805.7.patch, HIVE-4051.D11805.8.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the partitions, even in the same hive client.
 The queries are automatically generated by JDO/DataNucleus which makes it 
 nearly impossible to rewrite it into a single denormalized join operation  
 process it locally.
 Attempts to optimize this with JDO fetch-groups did not bear fruit in 
 improving the query count.

--
This message is

[jira] [Resolved] (HIVE-4916) Add TezWork


 [ 
https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-4916.
--

Resolution: Fixed

Committed to tez branch.

 Add TezWork
 ---

 Key: HIVE-4916
 URL: https://issues.apache.org/jira/browse/HIVE-4916
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-4916.1.patch.branch, HIVE-4916.2.patch.txt


 TezWork is the class that encapsulates all the info needed to execute a 
 single Tez job (i.e.: a dag of map or reduce work).
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow


[ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730137#comment-13730137
 ] 

Phabricator commented on HIVE-4051:
---

sershe has commented on the revision HIVE-4051 [jira] Hive's metastore suffers 
from 1+N queries when querying partitions  is slow.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1681 
fixed
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1776 
fixed

REVISION DETAIL
  https://reviews.facebook.net/D11805

To: JIRA, ashutoshc, sershe
Cc: brock


 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
 HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch, 
 HIVE-4051.D11805.6.patch, HIVE-4051.D11805.7.patch, HIVE-4051.D11805.8.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the partitions, even in the same hive client.
 The queries are automatically generated by JDO/DataNucleus which makes it 
 nearly impossible to rewrite it into a single denormalized join operation  
 process it locally.
 Attempts to optimize this with JDO fetch-groups did not bear fruit in 
 improving the query count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private


 [ 
https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-5002:


Status: Patch Available  (was: Open)

 Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
 ---

 Key: HIVE-5002
 URL: https://issues.apache.org/jira/browse/HIVE-5002
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: HIVE-5002.D12015.1.patch


 Some users want to be able to access the rowIndexes directly from ORC reader 
 extensions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4991) hive build with 0.20 is broken


 [ 
https://issues.apache.org/jira/browse/HIVE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4991:


Attachment: HIVE-4991.3.patch.txt

HIVE-4991.3.patch.txt - change to use version number from 
ivy/libraries.properties 

 hive build with 0.20 is broken
 --

 Key: HIVE-4991
 URL: https://issues.apache.org/jira/browse/HIVE-4991
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Edward Capriolo
Priority: Blocker
  Labels: newbie
 Attachments: HIVE-4991.2.patch.txt, HIVE-4991.3.patch.txt, 
 HIVE-4991.patch.txt


 As reported in HIVE-4911 
 ant clean package -Dhadoop.mr.rev=20
 Fails with - 
 {code}
 compile:
  [echo] Project: ql
 [javac] Compiling 898 source files to 
 /Users/malakar/code/oss/hive/build/ql/classes
 [javac] 
 /Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:35:
  package org.apache.commons.io does not exist
 [javac] import org.apache.commons.io.FileUtils;
 [javac] ^
 [javac] 
 /Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:743:
  cannot find symbol
 [javac] symbol  : variable FileUtils
 [javac] location: class org.apache.hadoop.hive.ql.session.SessionState
 [javac] FileUtils.deleteDirectory(resourceDir);
 [javac] ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4991) hive build with 0.20 is broken


[ 
https://issues.apache.org/jira/browse/HIVE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730182#comment-13730182
 ] 

Gunther Hagleitner commented on HIVE-4991:
--

[~appodictic] If you're fine with patch .3 (only difference is to reuse 
existing property) I'll commit that.

 hive build with 0.20 is broken
 --

 Key: HIVE-4991
 URL: https://issues.apache.org/jira/browse/HIVE-4991
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Edward Capriolo
Priority: Blocker
  Labels: newbie
 Attachments: HIVE-4991.2.patch.txt, HIVE-4991.3.patch.txt, 
 HIVE-4991.patch.txt


 As reported in HIVE-4911 
 ant clean package -Dhadoop.mr.rev=20
 Fails with - 
 {code}
 compile:
  [echo] Project: ql
 [javac] Compiling 898 source files to 
 /Users/malakar/code/oss/hive/build/ql/classes
 [javac] 
 /Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:35:
  package org.apache.commons.io does not exist
 [javac] import org.apache.commons.io.FileUtils;
 [javac] ^
 [javac] 
 /Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:743:
  cannot find symbol
 [javac] symbol  : variable FileUtils
 [javac] location: class org.apache.hadoop.hive.ql.session.SessionState
 [javac] FileUtils.deleteDirectory(resourceDir);
 [javac] ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4991) hive build with 0.20 is broken


[ 
https://issues.apache.org/jira/browse/HIVE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730185#comment-13730185
 ] 

Gunther Hagleitner commented on HIVE-4991:
--

[~amalakar] I cannot reproduce your findings. With Ed's patch in place 
everything compiles for me for 20, 20S and 23. All of them have commons-io on 
the classpath now.

 hive build with 0.20 is broken
 --

 Key: HIVE-4991
 URL: https://issues.apache.org/jira/browse/HIVE-4991
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Edward Capriolo
Priority: Blocker
  Labels: newbie
 Attachments: HIVE-4991.2.patch.txt, HIVE-4991.3.patch.txt, 
 HIVE-4991.patch.txt


 As reported in HIVE-4911 
 ant clean package -Dhadoop.mr.rev=20
 Fails with - 
 {code}
 compile:
  [echo] Project: ql
 [javac] Compiling 898 source files to 
 /Users/malakar/code/oss/hive/build/ql/classes
 [javac] 
 /Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:35:
  package org.apache.commons.io does not exist
 [javac] import org.apache.commons.io.FileUtils;
 [javac] ^
 [javac] 
 /Users/malakar/code/oss/hive/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:743:
  cannot find symbol
 [javac] symbol  : variable FileUtils
 [javac] location: class org.apache.hadoop.hive.ql.session.SessionState
 [javac] FileUtils.deleteDirectory(resourceDir);
 [javac] ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4573) Support alternate table types for HiveServer2


 [ 
https://issues.apache.org/jira/browse/HIVE-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4573:
-

   Resolution: Fixed
Fix Version/s: 0.12.0
 Release Note: 
Adds new config parameter that needs to be documented:

property
  namehive.server2.table.type.mapping/name
  valueHIVE/value
  description
   This setting reflects how HiveServer will report the table types for JDBC 
and other
   client implementations that retrieves the available tables and supported 
table types
 HIVE : Exposes the hive's native table tyes like MANAGED_TABLE, 
EXTERNAL_TABLE, VIRTUAL_VIEW
 CLASSIC : More generic types like TABLE and VIEW
  /description
/property
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Prasad!

 Support alternate table types for HiveServer2
 -

 Key: HIVE-4573
 URL: https://issues.apache.org/jira/browse/HIVE-4573
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.10.0
Reporter: Johndee Burks
Assignee: Prasad Mujumdar
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4573.1.patch, HIVE-4573.2.patch


 The getTables jdbc function no longer returns information when using normal 
 JDBC table types like TABLE or VIEW. You must now use a more specific type 
 such as MANAGED_TABLE or VIRTUAL_VIEW. An example application that will fail 
 to return results against 0.10 is below, works without issue in 0.9. In my 
 0.10 test I used HS2. 
 {code}
 import java.sql.SQLException;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.Statement;
 import java.sql.DriverManager;
 import org.apache.hive.jdbc.HiveDriver;
 import java.sql.DatabaseMetaData;
 public class TestGet {
   private static String driverName = org.apache.hive.jdbc.HiveDriver;
   /**
  * @param args
  * @throws SQLException
*/
   public static void main(String[] args) throws SQLException {
   try {
   Class.forName(driverName);
 } catch (ClassNotFoundException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
   System.exit(1);
 }
 Connection con = 
 DriverManager.getConnection(jdbc:hive2://hostname:1/default);
 DatabaseMetaData dbmd = con.getMetaData();
 String[] types = {TABLE};
 ResultSet rs = dbmd.getTables(null, null, %, types);
   while (rs.next()) {
 System.out.println(rs.getString(TABLE_NAME));
   }
 }
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow

2013-08-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-4051:
---

Status: Open  (was: Patch Available)

 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
 HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch, 
 HIVE-4051.D11805.6.patch, HIVE-4051.D11805.7.patch, HIVE-4051.D11805.8.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the partitions, even in the same hive client.
 The queries are automatically generated by JDO/DataNucleus which makes it 
 nearly impossible to rewrite it into a single denormalized join operation  
 process it locally.
 Attempts to optimize this with JDO fetch-groups did not bear fruit in 
 improving the query count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4990) ORC seeks fails with non-zero offset or column projection