date:20130903


[ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756385#comment-13756385
 ] 

Thejas M Nair commented on HIVE-5018:
-

I haven't seen that checkstyle error you are seeing. I am not sure why it is 
happening. You can try canceling the patch , uploading a new file and making it 
patch available again to see if that happens on the pre-commit test environment 
as well

I went through some of the changes, I see that you are trying to save on object 
reference creation, such as the following - 

{code}
--- 
a/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
+++ 
b/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
@@ -113,8 +113,9 @@ public boolean cleanUp(String rowID) {
   scan.setFilter(filter);
   ResultScanner scanner = htable.getScanner(scan);
   ArrayListDelete toDelete = new ArrayListDelete();
+  Delete delete;
   for (Result result : scanner) {
-Delete delete = new Delete(result.getRow());
+delete = new Delete(result.getRow());
 toDelete.add(delete);
   }
   htable.delete(toDelete);
{code}

While object creation has significant costs associated with it, I don't think 
this reference re-use will have any real impact. A reference is like a pointer 
in C/C++ , a memory location that stores the address of the object. JVM would 
be able to re-use this memory location in the existing implementation.  Can you 
give more details of the arithmetic program you ran to check the performance 
difference (including the code, total runtime, how many times you ran it)?


 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-5018.1.patch.txt


 Object instantiation inside loops is very expensive. Where possible, object 
 references should be created outside the loop so that they can be reused.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode

2013-09-03 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756390#comment-13756390
 ] 

Vaibhav Gumashta commented on HIVE-4617:


Just for comparison, metastore currently has 200 min and 100,000 max threads. 
Proposing here, that HS2 has 500 thrift workers and 500 async threads. Also, 
was thinking of changing the async thread pool to a cached thread pool, which 
will not keep all 500 threads alive all the time, but will create new threads 
when required, keeping the unused ones alive for a certain time before purging 
them. Will be interesting to hear more thoughts. Thanks!

 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507.2.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil


[ 
https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756398#comment-13756398
 ] 

Hudson commented on HIVE-5163:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #398 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/398/])
HIVE-5163 : refactor org.apache.hadoop.mapred.HCatMapRedUtil - 
HIVE-5163.update.2 (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519538)
* 
/hive/trunk/hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatMapRedUtil.java


 refactor org.apache.hadoop.mapred.HCatMapRedUtil
 

 Key: HIVE-5163
 URL: https://issues.apache.org/jira/browse/HIVE-5163
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update, 
 HIVE-5163.update.2


 Everything that this class does is delegated to a Shim class.
 To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of 
 HCatMapRedUtil and make the calls directly to the Shim layer.  It will make 
 it easier because all org.apache.hcatalog classes will move to 
 org.apache.hive.hcatalog classes thus making way to provide binary backwards 
 compat.  This class won't change it's name so it's more difficult to provide 
 backwards compat.  The org.apache.hadoop.mapred.TempletonJobTracker is not an 
 issue since it goes away in HIVE-4460.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil


[ 
https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756399#comment-13756399
 ] 

Hudson commented on HIVE-5163:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #82 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/82/])
HIVE-5163 : refactor org.apache.hadoop.mapred.HCatMapRedUtil - 
HIVE-5163.update.2 (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519538)
* 
/hive/trunk/hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatMapRedUtil.java


 refactor org.apache.hadoop.mapred.HCatMapRedUtil
 

 Key: HIVE-5163
 URL: https://issues.apache.org/jira/browse/HIVE-5163
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update, 
 HIVE-5163.update.2


 Everything that this class does is delegated to a Shim class.
 To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of 
 HCatMapRedUtil and make the calls directly to the Shim layer.  It will make 
 it easier because all org.apache.hcatalog classes will move to 
 org.apache.hive.hcatalog classes thus making way to provide binary backwards 
 compat.  This class won't change it's name so it's more difficult to provide 
 backwards compat.  The org.apache.hadoop.mapred.TempletonJobTracker is not an 
 issue since it goes away in HIVE-4460.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756400#comment-13756400
 ] 

Hudson commented on HIVE-5137:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #82 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/82/])
HIVE-5137: A Hive SQL query should not return a ResultSet when the underlying 
plan does not include a FetchTask (Vaibhav Gumashta via Thejas Nair) (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519547)
* /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java
* 
/hive/trunk/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java


 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756418#comment-13756418
 ] 

Phabricator commented on HIVE-4617:
---

cwsteinbach has commented on the revision HIVE-4617 [jira] 
ExecuteStatementAsync call to run a query in non-blocking mode.

INLINE COMMENTS
  conf/hive-default.xml.template:1857 This value doesn't match the one listed 
in HiveConf (500).
  conf/hive-default.xml.template:1864 Do you think people will actually want to 
set this value to a fraction of second? If not I recommend changing the units 
to seconds.
  service/if/TCLIService.thrift:42 Please add a blank line between lines 41 and 
42.
  service/src/java/org/apache/hive/service/cli/OperationState.java:68 I think 
WAITING-FINISHED and WAITING-CLOSED probably aren't valid transitions, but 
I'm not sure. What do you think?
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java:59 
s/VG/TODO/
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java:81 
10,000ms should not be hardcoded. Please reference 'timeout' instead.
  service/src/java/org/apache/hive/service/cli/OperationState.java:75 If 
RUNNING-WAITING is not a valid transition (which makes sense to me), then 
maybe we should use the adjective PENDING instead of WAITING.

  There are only two hard things in Computer Science: cache invalidation and 
naming things.
  -- Phil Karlton

REVISION DETAIL
  https://reviews.facebook.net/D12507

To: JIRA, vaibhavgumashta
Cc: cwsteinbach


 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507.2.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5196) ThriftCLIService.java uses stderr to print the stack trace, it should use the logger instead.

2013-09-03 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756425#comment-13756425
 ] 

Carl Steinbach commented on HIVE-5196:
--

[~vgumashta] Thanks for catching this. I think we may actually want to remove 
these statements altogether, the rationale being that we probably don't want to 
write a message to the server's log every time a client tries to execute an 
illegal statement or calls an RPC with invalid input parameter values. Ideally 
this error information should be returned directly to the client instead.

 ThriftCLIService.java uses stderr to print the stack trace, it should use the 
 logger instead.
 -

 Key: HIVE-5196
 URL: https://issues.apache.org/jira/browse/HIVE-5196
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta

 ThriftCLIService.java uses stderr to print the stack trace, it should use the 
 logger instead. Using e.printStackTrace is not suitable for production.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756429#comment-13756429
 ] 

Phabricator commented on HIVE-4617:
---

vaibhavgumashta has commented on the revision HIVE-4617 [jira] 
ExecuteStatementAsync call to run a query in non-blocking mode.

INLINE COMMENTS
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:152 
This should go. RUNNING is set in runInternal.

REVISION DETAIL
  https://reviews.facebook.net/D12507

To: JIRA, vaibhavgumashta
Cc: cwsteinbach


 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507.2.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756465#comment-13756465
 ] 

Hudson commented on HIVE-5137:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #149 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/149/])
HIVE-5137: A Hive SQL query should not return a ResultSet when the underlying 
plan does not include a FetchTask (Vaibhav Gumashta via Thejas Nair) (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519547)
* /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java
* 
/hive/trunk/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java


 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil


[ 
https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756464#comment-13756464
 ] 

Hudson commented on HIVE-5163:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #149 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/149/])
HIVE-5163 : refactor org.apache.hadoop.mapred.HCatMapRedUtil - 
HIVE-5163.update.2 (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519538)
* 
/hive/trunk/hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatMapRedUtil.java


 refactor org.apache.hadoop.mapred.HCatMapRedUtil
 

 Key: HIVE-5163
 URL: https://issues.apache.org/jira/browse/HIVE-5163
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update, 
 HIVE-5163.update.2


 Everything that this class does is delegated to a Shim class.
 To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of 
 HCatMapRedUtil and make the calls directly to the Shim layer.  It will make 
 it easier because all org.apache.hcatalog classes will move to 
 org.apache.hive.hcatalog classes thus making way to provide binary backwards 
 compat.  This class won't change it's name so it's more difficult to provide 
 backwards compat.  The org.apache.hadoop.mapred.TempletonJobTracker is not an 
 issue since it goes away in HIVE-4460.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756583#comment-13756583
 ] 

Hudson commented on HIVE-5137:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #399 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/399/])
HIVE-5137: A Hive SQL query should not return a ResultSet when the underlying 
plan does not include a FetchTask (Vaibhav Gumashta via Thejas Nair) (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519547)
* /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java
* 
/hive/trunk/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java


 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5149:
---

Status: Patch Available  (was: Open)

trigger the pre-commit build 

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5149:
---

Status: Open  (was: Patch Available)

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1511) Hive plan serialization is slow


[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756646#comment-13756646
 ] 

Brock Noland commented on HIVE-1511:


Looks like there was a compilation issue in Hcatalog. I see a ptest trunk build 
ran successfully after this so I kicked it off again.

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.11.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.10.patch, 
 HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, 
 HIVE-1511.8.patch, HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, 
 HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: RFC: Major HCatalog refactoring

2013-09-03 Thread Brock Noland

OK that should be fine.  Though I would echo Edwards sentiment about
adding so much test time. Do these tests have to run each time? Does
it make sense to have an test target such as test-all-hcatalog and
then have then run them periodically manually, especially before
releases?

On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
ekoif...@hortonworks.com wrote:
 These will be new (I.e. 0.11 version) test classes which will be in the old
 org.apache.hcatalog package.  How does that affect the new framework?

 On Saturday, August 31, 2013, Brock Noland wrote:

 Will these be new Java class files or new test methods to existing
 classes?  I am just curious as to how this will play into the
 distributed testing framework.

 On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
 ekoif...@hortonworks.com wrote:
  not quite double but close  (on my Mac that means it will go up from 35
  minutes to 55-60) so in greater scheme of things it should be negligible
 
 
 
  On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
 
  By coverage do you mean to say that:
 
   Thus, the published HCatalog JARs will contain both packages and the
 unit
   tests will cover both versions of the API.
 
  We are going to double the time of unit tests for this module?
 
 
  On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman 
 ekoif...@hortonworks.com
  wrote:
 
   This will change every file under hcatalog so it has to happen before
 the
   branching.  Most likely at the beginning of next week.
  
   Thanks
  
  
   On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman 
  ekoif...@hortonworks.com
   wrote:
  
Hi,
   
   
Here is the plan for refactoring HCatalog as was agreed to when it
 was
merged into Hive during.  HIVE-4869 is the umbrella bug for this
 work.
The
changes are complex and touch every single file under hcatalog.
  Please
comment.
   
When HCatalog project was merged into Hive on 0.11 several
 integration
items did not make the 0.11 deadline.  It was agreed to finish them
 in
   0.12
release.  Specifically:
   
1. HIVE-4895 - change package name from org.apache.hcatalog to
org.apache.hive.hcatalog
   
2. HIVE-4896 - create binary backwards compatibility layer for hcat
  users
upgrading from 0.11 to 0.12
   
For item 1, we’ll just move every file under org.apache.hcatalog to
org.apache.hive.hcatalog and update all “package” and “import”
  statement
   as
well as all hcat/webhcat scripts.  This will include all JUnit
 tests.
   
Item 2 will ensure that if a user has a M/R program or Pig script,
 etc.
that uses HCatalog public API, their programs will continue to work
 w/o
change with hive 0.12.
   
The proposal is to make the changes that have as little impact on
 the
build system, in part to make upcoming ‘mavenization’ of hive
 easier,
  in
part to make the changes more manageable.
   
   
   
The list of public interfaces (and their transitive closure) for
 which
backwards compat will be provided.
   
   1.
   
   HCatLoader
   2.
   
   HCatStorer
   3.
   
   HCatInputFormat
   4.
   
   HCatOutputFormat
   5.
   
   HCatReader
   6.
   
   HCatWriter
   7.
   
   HCatRecord
   8.
   
   HCatSchema
   
   
To achieve this, 0.11 version of these classes will be added in
org.apache.hcatalog package (after item 1 is done).  Each of these
   classes
 --
 Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5149:
---

Attachment: HIVE-5149.3.patch

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5149:
---

Attachment: (was: HIVE-5149.3.patch)

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask

2013-09-03 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756680#comment-13756680
 ] 

Edward Capriolo commented on HIVE-5137:
---

If you call fetchAll() does it return empty List or throw exception? There may 
be some users calling fetchAll() regardless of the query.

 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756671#comment-13756671
 ] 

Hudson commented on HIVE-5137:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2307 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2307/])
HIVE-5137: A Hive SQL query should not return a ResultSet when the underlying 
plan does not include a FetchTask (Vaibhav Gumashta via Thejas Nair) (thejas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519547)
* /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java
* 
/hive/trunk/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java


 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4


 [ 
https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5112:
---

Attachment: HIVE-5112.2.patch

v2 of the patch uses Hadoop 2.1.0-beta.

 Upgrade protobuf to 2.5 from 2.4
 

 Key: HIVE-5112
 URL: https://issues.apache.org/jira/browse/HIVE-5112
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Owen O'Malley
 Attachments: HIVE-5112.2.patch, HIVE-5112.D12429.1.patch


 Hadoop and Hbase have both upgraded protobuf. We should as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756723#comment-13756723
 ] 

Thejas M Nair commented on HIVE-5137:
-

[~appodictic] I didn't understand your comment. fetchAll() method of which 
class ?

 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask

2013-09-03 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756730#comment-13756730
 ] 

Edward Capriolo commented on HIVE-5137:
---

HiveInterface.fetchAll(). I know we have scripts that call FetchAll() or 
fetchOne() on queries that probably do not have one. I wanted to make sure this 
will not be a breaking change to existing code. 

 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5196) ThriftCLIService.java uses stderr to print the stack trace, it should use the logger instead.


[ 
https://issues.apache.org/jira/browse/HIVE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756733#comment-13756733
 ] 

Thejas M Nair commented on HIVE-5196:
-

[~cwsteinbach] The jdbc driver itself does not do any logging, and relying on 
the application to follow good practices such as logging does not always work. 
Also, when something goes wrong, it is not always because of an user error. For 
debugging when something goes wrong, I think it will be very valuable to log 
the errors on server side as well. The error here can probably be logged in the 
server as INFO (or WARN) instead of ERROR.

cc [~prasadm]

 ThriftCLIService.java uses stderr to print the stack trace, it should use the 
 logger instead.
 -

 Key: HIVE-5196
 URL: https://issues.apache.org/jira/browse/HIVE-5196
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta

 ThriftCLIService.java uses stderr to print the stack trace, it should use the 
 logger instead. Using e.printStackTrace is not suitable for production.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-09-03 Thread Benjamin Jakobus (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756731#comment-13756731
 ] 

Benjamin Jakobus commented on HIVE-5018:


Here is some sample code:
{quote}
/**
 * Examine the performance difference between declaring variables 
inside loops
 * and declaring them outside of loops.
 */
public class InLoopInstantiationTest {

public InLoopInstantiationTest() {
long start = System.currentTimeMillis();
SessionIdentifierGenerator gen = new SessionIdentifierGenerator();
for (int i = 0; i  1; i++) {
FooBar f = new FooBar();
Integer i1 = new Integer(i);
String s = gen.nextSessionId();
}
long end = System.currentTimeMillis();
System.out.println(in loop instantiation took  + (end - start) +  
milliseconds);

start = System.currentTimeMillis();
FooBar f;
Integer i1;
String s;
for (int i = 0; i  1; i++) {
f = new FooBar();
i1 = new Integer(i);
s = gen.nextSessionId();
}
end = System.currentTimeMillis();
System.out.println(avoiding in loop instantiation took  + (end - 
start) +  milliseconds);

}

public static void main(String[] args) {
new InLoopInstantiationTest();
}

private class FooBar {

private String foo = asdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasd
+ asdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasd
+ 
asdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasd;
}

public final class SessionIdentifierGenerator {

private SecureRandom random = new SecureRandom();

public String nextSessionId() {
return new BigInteger(130, random).toString(32);
}
}
}

{quote}

The arithmetic script that I used to test this code in Hive is:

{quote}
SELECT (dataset.age * dataset.gpa + 3) AS F1, (dataset.age/dataset.gpa - 1.5) 
AS F2
FROMdataset
WHERE   dataset.gpa  0;
{quote}



 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-5018.1.patch.txt


 Object instantiation inside loops is very expensive. Where possible, object 
 references should be created outside the loop so that they can be reused.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756739#comment-13756739
 ] 

Thejas M Nair commented on HIVE-5137:
-

HiveInterface is specific to hiveserver1, so this HS2 change will have no 
impact.


 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)


[ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756749#comment-13756749
 ] 

Thejas M Nair commented on HIVE-5018:
-

What was the runtime ? To reduce the impact of noise caused by OS process and 
other things running on the system, I would recommend making sure that each run 
for at least 100 seconds (by increasing the number of iterations in the loop), 
and repeating it a few times (3-4?). Can you please publish the numbers with 
that ?
If there is no noticeable performance difference, I would rather stick to 
instantiating the variables inside the loop. That limits the scope of these 
variables and makes for more readable code (and prevents accidental use).




 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-5018.1.patch.txt


 Object instantiation inside loops is very expensive. Where possible, object 
 references should be created outside the loop so that they can be reused.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: RFC: Major HCatalog refactoring

2013-09-03 Thread Eugene Koifman

Current (sequential) run of all hive/hcat unit tests takes 10-15 hours.  Is
another 20-30 minutes that significant?

I'm generally wary of unit tests that are not run continuously and
automatically.  It delays the detection of problems and then what was
probably an obvious fix at the time the change was made becomes a long
debugging session (often by someone other than whose change broke things).
 I think this is especially true given how many people are contributing to
hive.



On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote:

 OK that should be fine.  Though I would echo Edwards sentiment about
 adding so much test time. Do these tests have to run each time? Does
 it make sense to have an test target such as test-all-hcatalog and
 then have then run them periodically manually, especially before
 releases?

 On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
 ekoif...@hortonworks.com wrote:
  These will be new (I.e. 0.11 version) test classes which will be in the
 old
  org.apache.hcatalog package.  How does that affect the new framework?
 
  On Saturday, August 31, 2013, Brock Noland wrote:
 
  Will these be new Java class files or new test methods to existing
  classes?  I am just curious as to how this will play into the
  distributed testing framework.
 
  On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
  ekoif...@hortonworks.com wrote:
   not quite double but close  (on my Mac that means it will go up from
 35
   minutes to 55-60) so in greater scheme of things it should be
 negligible
  
  
  
   On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo 
 edlinuxg...@gmail.com
  wrote:
  
   By coverage do you mean to say that:
  
Thus, the published HCatalog JARs will contain both packages and
 the
  unit
tests will cover both versions of the API.
  
   We are going to double the time of unit tests for this module?
  
  
   On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman 
  ekoif...@hortonworks.com
   wrote:
  
This will change every file under hcatalog so it has to happen
 before
  the
branching.  Most likely at the beginning of next week.
   
Thanks
   
   
On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman 
   ekoif...@hortonworks.com
wrote:
   
 Hi,


 Here is the plan for refactoring HCatalog as was agreed to when
 it
  was
 merged into Hive during.  HIVE-4869 is the umbrella bug for this
  work.
 The
 changes are complex and touch every single file under hcatalog.
   Please
 comment.

 When HCatalog project was merged into Hive on 0.11 several
  integration
 items did not make the 0.11 deadline.  It was agreed to finish
 them
  in
0.12
 release.  Specifically:

 1. HIVE-4895 - change package name from org.apache.hcatalog to
 org.apache.hive.hcatalog

 2. HIVE-4896 - create binary backwards compatibility layer for
 hcat
   users
 upgrading from 0.11 to 0.12

 For item 1, we’ll just move every file under org.apache.hcatalog
 to
 org.apache.hive.hcatalog and update all “package” and “import”
   statement
as
 well as all hcat/webhcat scripts.  This will include all JUnit
  tests.

 Item 2 will ensure that if a user has a M/R program or Pig
 script,
  etc.
 that uses HCatalog public API, their programs will continue to
 work
  w/o
 change with hive 0.12.

 The proposal is to make the changes that have as little impact on
  the
 build system, in part to make upcoming ‘mavenization’ of hive
  easier,
   in
 part to make the changes more manageable.



 The list of public interfaces (and their transitive closure) for
  which
 backwards compat will be provided.

1.

HCatLoader
2.

HCatStorer
3.

HCatInputFormat
4.

HCatOutputFormat
5.

HCatReader
6.

HCatWriter
7.

HCatRecord
8.

HCatSchema


 To achieve this, 0.11 version of these classes will be added in
 org.apache.hcatalog package (after item 1 is done).  Each of
 these
classes
  --
  Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.



 --
 Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the

[jira] [Commented] (HIVE-1511) Hive plan serialization is slow

2013-09-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756764#comment-13756764
 ] 

Hive QA commented on HIVE-1511:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12601094/HIVE-1511.10.patch

{color:red}ERROR:{color} -1 due to 50 failed/errored test(s), 2905 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_cast1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join8
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_part1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_subq
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testsequencefile
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_when
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_case_sensitivity
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input9
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udfnull
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input16
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join6
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_case
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/591/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/591/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 50 tests failed
{noformat}

This message is automatically generated.

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.11.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.10.patch, 
 HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, 
 HIVE-1511.8.patch, HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, 
 HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR

Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-09-03 Thread agateaaa

Uploaded a patch for HiVE-5172. Can someone please review?

Do I have to be a contributor before I submit a patch?

Did run about test with the patch in our test environment (Ran about 1000
pig jobs to read and insert into hive table (via hcatalog), along with
equal number of alter table statements)  for the past 4 days and haven't
seen any error on the client or the server.



On Thu, Aug 29, 2013 at 2:39 PM, agateaaa agate...@gmail.com wrote:

 Thanks Ashutosh.

 Filed https://issues.apache.org/jira/browse/HIVE-5172



 On Thu, Aug 29, 2013 at 11:53 AM, Ashutosh Chauhan 
 hashut...@apache.orgwrote:

 Thanks Agatea for digging in. Seems like you have hit a bug. Would you
 mind opening a jira and adding your findings to it.

 Thanks,
 Ashutosh


 On Thu, Aug 29, 2013 at 11:22 AM, agateaaa agate...@gmail.com wrote:

 Sorry hit send too soon ...

 Hi All:

 Put some debugging code in TUGIContainingTransport.getTransport() and I
 tracked it down to

 @Override
 public TUGIContainingTransport getTransport(TTransport trans) {

 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

 //return transMap.get(trans); //-change
   TUGIContainingTransport retTrans = transMap.get(trans);

   if ( retTrans == null ) {
  LOGGER.error ( cannot find transport that was in map !!)
}  else {
  LOGGER.debug ( cannot find transport that was in map !!)
  return retTrans;
}
 }

 When we run this in our test environment, see that we run into the
 problem
 just after GC runs,
 and cannot find transport that was in the map!! message gets logged.

 Could the GC be collecting entries from transMap, just before the we get
 it

 Tried a minor change which seems to work

 public TUGIContainingTransport getTransport(TTransport trans) {

TUGIContainingTransport retTrans = transMap.get(trans);

 if ( retTrans == null ) {
 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, retTrans);
 }
return retTrans;
 }


 My questions for hive and  thrift experts

 1.) Do we need to use a ConcurrentMap
 ConcurrentMapTTransport, TUGIContainingTransport transMap = new
 MapMaker().weakKeys().weakValues().makeMap();
 It does use == to compare keys (which might be the problem), also in this
 case we cant rely on the trans to be always there in the transMap, even
 after a put, so in that case change above
 probably makes sense


 2.) Is it better idea to use WeakHashMap with WeakReference instead ?
 (was
 looking at org.apache.thrift.transport.TSaslServerTransport, esp change
 made by THRIFT-1468)

 e.g.
 private static MapTTransport, WeakReferenceTUGIContainingTransport
 transMap3 = Collections.synchronizedMap(new WeakHashMapTTransport,
 WeakReferenceTUGIContainingTransport());

 getTransport() would be something like

 public TUGIContainingTransport getTransport(TTransport trans) {
 WeakReferenceTUGIContainingTransport ret = transMap.get(trans);
 if (ret == null || ret.get() == null) {
 ret = new WeakReferenceTUGIContainingTransport(new
 TUGIContainingTransport(trans));
 transMap3.put(trans, ret); // No need for putIfAbsent().
 // Concurrent calls to getTransport() will pass in different TTransports.
 }
 return ret.get();
 }


 I did try 1.) above in our test environment and it does seem to resolve
 the
 problem, though i am not sure if I am introducing any other problem


 Can someone help ?


 Thanks
 Agatea













 On Thu, Aug 29, 2013 at 10:57 AM, agateaaa agate...@gmail.com wrote:

  Hi All:
 
  Put some debugging code in TUGIContainingTransport.getTransport() and I
  tracked it down to
 
  @Override
  public TUGIContainingTransport getTransport(TTransport trans) {
 
  // UGI information is not available at connection setup time, it will
 be
  set later
  // via set_ugi() rpc.
  transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));
 
  //return transMap.get(trans); -change
TUGIContainingTransport retTrans = transMap.get(trans);
 
if ( retTrans == null ) {
 
 
 
  }
 
 
 
 
 
  On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote:
 
  Thanks Nitin
 
  There arent too many connections in close_wait state only 1 or two
 when
  we run into this. Most likely its because of dropped connection.
 
  I could not find any read or write timeouts we can set for the thrift
  server which will tell thrift to hold on to the client connection.
   See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt
  seem to have been implemented yet. We do have set a client connection
  timeout but cannot find
  an equivalent setting for the server.
 
  We have  a suspicion that this happens when we run two client
 processes
  which modify two distinct partitions of the same hive table. We put
 in a
  workaround so that the two

Re: RFC: Major HCatalog refactoring

2013-09-03 Thread Thejas Nair

One thing to note is that the 0.11 interfaces are going to be
deprecated and will be taken away in a later release. When the
interface is taken away, the additional unit tests will also go away.

On Tue, Sep 3, 2013 at 9:57 AM, Eugene Koifman ekoif...@hortonworks.com wrote:
 Current (sequential) run of all hive/hcat unit tests takes 10-15 hours.  Is
 another 20-30 minutes that significant?

 I'm generally wary of unit tests that are not run continuously and
 automatically.  It delays the detection of problems and then what was
 probably an obvious fix at the time the change was made becomes a long
 debugging session (often by someone other than whose change broke things).
  I think this is especially true given how many people are contributing to
 hive.



 On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote:

 OK that should be fine.  Though I would echo Edwards sentiment about
 adding so much test time. Do these tests have to run each time? Does
 it make sense to have an test target such as test-all-hcatalog and
 then have then run them periodically manually, especially before
 releases?

 On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
 ekoif...@hortonworks.com wrote:
  These will be new (I.e. 0.11 version) test classes which will be in the
 old
  org.apache.hcatalog package.  How does that affect the new framework?
 
  On Saturday, August 31, 2013, Brock Noland wrote:
 
  Will these be new Java class files or new test methods to existing
  classes?  I am just curious as to how this will play into the
  distributed testing framework.
 
  On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
  ekoif...@hortonworks.com wrote:
   not quite double but close  (on my Mac that means it will go up from
 35
   minutes to 55-60) so in greater scheme of things it should be
 negligible
  
  
  
   On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo 
 edlinuxg...@gmail.com
  wrote:
  
   By coverage do you mean to say that:
  
Thus, the published HCatalog JARs will contain both packages and
 the
  unit
tests will cover both versions of the API.
  
   We are going to double the time of unit tests for this module?
  
  
   On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman 
  ekoif...@hortonworks.com
   wrote:
  
This will change every file under hcatalog so it has to happen
 before
  the
branching.  Most likely at the beginning of next week.
   
Thanks
   
   
On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman 
   ekoif...@hortonworks.com
wrote:
   
 Hi,


 Here is the plan for refactoring HCatalog as was agreed to when
 it
  was
 merged into Hive during.  HIVE-4869 is the umbrella bug for this
  work.
 The
 changes are complex and touch every single file under hcatalog.
   Please
 comment.

 When HCatalog project was merged into Hive on 0.11 several
  integration
 items did not make the 0.11 deadline.  It was agreed to finish
 them
  in
0.12
 release.  Specifically:

 1. HIVE-4895 - change package name from org.apache.hcatalog to
 org.apache.hive.hcatalog

 2. HIVE-4896 - create binary backwards compatibility layer for
 hcat
   users
 upgrading from 0.11 to 0.12

 For item 1, we’ll just move every file under org.apache.hcatalog
 to
 org.apache.hive.hcatalog and update all “package” and “import”
   statement
as
 well as all hcat/webhcat scripts.  This will include all JUnit
  tests.

 Item 2 will ensure that if a user has a M/R program or Pig
 script,
  etc.
 that uses HCatalog public API, their programs will continue to
 work
  w/o
 change with hive 0.12.

 The proposal is to make the changes that have as little impact on
  the
 build system, in part to make upcoming ‘mavenization’ of hive
  easier,
   in
 part to make the changes more manageable.



 The list of public interfaces (and their transitive closure) for
  which
 backwards compat will be provided.

1.

HCatLoader
2.

HCatStorer
3.

HCatInputFormat
4.

HCatOutputFormat
5.

HCatReader
6.

HCatWriter
7.

HCatRecord
8.

HCatSchema


 To achieve this, 0.11 version of these classes will be added in
 org.apache.hcatalog package (after item 1 is done).  Each of
 these
classes
  --
  Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly

Re: RFC: Major HCatalog refactoring

I would say a main goal of unit and integration testing is to try all code
paths. If a testing framework is truly testing all code paths twice, there
is not much of a win there from a unit/integration tests standpoint. If the
unit tests created more coverage of the code that would be an obvious win.
I have not looked at your patch but from your description it sounds like we
are attempting to test a rename that does not sound like a win to me.

If the current hcatalog tests run in 15 minutes, you make a change and then
the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes
is a TV show :)

As for the overall hive build taking 10-15 hours. I know that :) I used to
run them, by hand, on my laptop, because no one would share their build
farm with me. I have heard that Hive consumes the vast majority of the
resources of apache's build farm! I think we need to be good citizens at
apache and attempt to make this better, not worse.

Now that we have pre-commit builds we can work at a reasonable pace. Now
that we have this nice pre-commit farm, I do not want to create a precedent
that now we can go nuts, and start down the same slippery slope.




On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman ekoif...@hortonworks.comwrote:

 Current (sequential) run of all hive/hcat unit tests takes 10-15 hours.  Is
 another 20-30 minutes that significant?

 I'm generally wary of unit tests that are not run continuously and
 automatically.  It delays the detection of problems and then what was
 probably an obvious fix at the time the change was made becomes a long
 debugging session (often by someone other than whose change broke things).
  I think this is especially true given how many people are contributing to
 hive.



 On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote:

  OK that should be fine.  Though I would echo Edwards sentiment about
  adding so much test time. Do these tests have to run each time? Does
  it make sense to have an test target such as test-all-hcatalog and
  then have then run them periodically manually, especially before
  releases?
 
  On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
  ekoif...@hortonworks.com wrote:
   These will be new (I.e. 0.11 version) test classes which will be in the
  old
   org.apache.hcatalog package.  How does that affect the new framework?
  
   On Saturday, August 31, 2013, Brock Noland wrote:
  
   Will these be new Java class files or new test methods to existing
   classes?  I am just curious as to how this will play into the
   distributed testing framework.
  
   On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
   ekoif...@hortonworks.com wrote:
not quite double but close  (on my Mac that means it will go up from
  35
minutes to 55-60) so in greater scheme of things it should be
  negligible
   
   
   
On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo 
  edlinuxg...@gmail.com
   wrote:
   
By coverage do you mean to say that:
   
 Thus, the published HCatalog JARs will contain both packages and
  the
   unit
 tests will cover both versions of the API.
   
We are going to double the time of unit tests for this module?
   
   
On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman 
   ekoif...@hortonworks.com
wrote:
   
 This will change every file under hcatalog so it has to happen
  before
   the
 branching.  Most likely at the beginning of next week.

 Thanks


 On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman 
ekoif...@hortonworks.com
 wrote:

  Hi,
 
 
  Here is the plan for refactoring HCatalog as was agreed to when
  it
   was
  merged into Hive during.  HIVE-4869 is the umbrella bug for
 this
   work.
  The
  changes are complex and touch every single file under hcatalog.
Please
  comment.
 
  When HCatalog project was merged into Hive on 0.11 several
   integration
  items did not make the 0.11 deadline.  It was agreed to finish
  them
   in
 0.12
  release.  Specifically:
 
  1. HIVE-4895 - change package name from org.apache.hcatalog to
  org.apache.hive.hcatalog
 
  2. HIVE-4896 - create binary backwards compatibility layer for
  hcat
users
  upgrading from 0.11 to 0.12
 
  For item 1, we’ll just move every file under
 org.apache.hcatalog
  to
  org.apache.hive.hcatalog and update all “package” and “import”
statement
 as
  well as all hcat/webhcat scripts.  This will include all JUnit
   tests.
 
  Item 2 will ensure that if a user has a M/R program or Pig
  script,
   etc.
  that uses HCatalog public API, their programs will continue to
  work
   w/o
  change with hive 0.12.
 
  The proposal is to make the changes that have as little impact
 on
   the
  build system, in part to make upcoming ‘mavenization’ of hive
   easier,
in
  part to make the changes more manageable.
 
 
 
  The list of

[jira] [Commented] (HIVE-5182) log more stuff via PerfLogger


[ 
https://issues.apache.org/jira/browse/HIVE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756813#comment-13756813
 ] 

Sergey Shelukhin commented on HIVE-5182:


This test is known flaky test, seems to be unrelated

 log more stuff via PerfLogger
 -

 Key: HIVE-5182
 URL: https://issues.apache.org/jira/browse/HIVE-5182
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5182.D12639.1.patch


 PerfLogger output is useful in understanding perf. There are large gaps in 
 it, however, and it's not clear what is going on during these. Some sections 
 are large and have no breakdown. It would be nice to add more stuff. At this 
 point I'm not certain where exactly, whoever makes the patch (me?) will just 
 need to look at the above gaps and fill them in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5197) TestE2EScenerios.createTaskAttempt should use MapRedUtil


 [ 
https://issues.apache.org/jira/browse/HIVE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5197:
---

  Priority: Minor  (was: Major)
Issue Type: Test  (was: Bug)

 TestE2EScenerios.createTaskAttempt should use MapRedUtil
 

 Key: HIVE-5197
 URL: https://issues.apache.org/jira/browse/HIVE-5197
 Project: Hive
  Issue Type: Test
Reporter: Brock Noland
Priority: Minor

 Basically we should use HCatMapRedUtil as opposed to new'ing the task attempt 
 context.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5129) Multiple table insert fails on count(distinct)

2013-09-03 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-5129:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Multiple table insert fails on count(distinct)
 --

 Key: HIVE-5129
 URL: https://issues.apache.org/jira/browse/HIVE-5129
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: aggrTestMultiInsertData1.txt, 
 aggrTestMultiInsertData.txt, aggrTestMultiInsert.q, HIVE-5129.1.patch.txt, 
 HIVE-5129.2.WIP.patch.txt, HIVE-5129.3.patch.txt, HIVE-5129.4.patch, 
 HIVE-5129.4.patch.txt


 Hive fails with a class cast exception on queries of the form:
 {noformat}
 from studenttab10k
 insert overwrite table multi_insert_2_1
 select name, avg(age) as avgage
 group by name
 insert overwrite table multi_insert_2_2
 select name, age, sum(gpa) as sumgpa
 group by name, age
 insert overwrite table multi_insert_2_3
 select name, count(distinct age) as distage
 group by name;
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask

2013-09-03 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756845#comment-13756845
 ] 

Edward Capriolo commented on HIVE-5137:
---

Ok makes sense. 

 A Hive SQL query should not return a ResultSet when the underlying plan does 
 not include a FetchTask
 

 Key: HIVE-5137
 URL: https://issues.apache.org/jira/browse/HIVE-5137
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, 
 HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, 
 HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch


 Currently, a query like create table if not exists t2 as select * from t1 
 sets the hasResultSet to true in SQLOperation and in turn, the query returns 
 a result set. However, as a DDL command, this should ideally not return a 
 result set.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5197) TestE2EScenerios.createTaskAttempt should use MapRedUtil

Brock Noland created HIVE-5197:
--

 Summary: TestE2EScenerios.createTaskAttempt should use MapRedUtil
 Key: HIVE-5197
 URL: https://issues.apache.org/jira/browse/HIVE-5197
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


Basically we should use HCatMapRedUtil as opposed to new'ing the task attempt 
context.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5060) JDBC driver assumes executeStatement is synchronous

2013-09-03 Thread Henry Robinson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756847#comment-13756847
 ] 

Henry Robinson commented on HIVE-5060:
--

[~vgumashta] - sorry about the delay, I've uploaded the patch to review board 
here: https://reviews.apache.org/r/13948/

The approach in HIVE-4569 will be more general, but this fixes an immediate 
issue for other implementations of the HS2 API at very little cost to Hive.

BTW, I believe the test failures from the patch are unrelated.

 JDBC driver assumes executeStatement is synchronous
 ---

 Key: HIVE-5060
 URL: https://issues.apache.org/jira/browse/HIVE-5060
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Henry Robinson
 Fix For: 0.11.1, 0.12.0

 Attachments: 
 0001-HIVE-5060-JDBC-driver-assumes-executeStatement-is-sy.patch, 
 HIVE-5060.patch


 The JDBC driver seems to assume that {{ExecuteStatement}} is a synchronous 
 call when performing updates via {{executeUpdate}}, where the following 
 comment on the RPC in the Thrift file indicates otherwise:
 {code}
 // ExecuteStatement()
 //
 // Execute a statement.
 // The returned OperationHandle can be used to check on the
 // status of the statement, and to fetch results once the
 // statement has finished executing.
 {code}
 I understand that Hive's implementation of {{ExecuteStatement}} is blocking 
 (see https://issues.apache.org/jira/browse/HIVE-4569), but presumably other 
 implementations of the HiveServer2 API (and I'm talking specifically about 
 Impala here, but others might have a similar concern) should be free to 
 return a pollable {{OperationHandle}} per the specification.
 The JDBC driver's {{executeUpdate}} is as follows:
 {code}
 public int executeUpdate(String sql) throws SQLException {
 execute(sql);
 return 0;
   }
 {code}
 {{execute(sql)}} discards the {{OperationHandle}} that it gets from the 
 server after determining whether there are results to be fetched.
 This is problematic for us, because Impala will cancel queries that are 
 running when a session executes, but there's no easy way to be sure that an 
 {{INSERT}} statement has completed before terminating a session on the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5197) TestE2EScenerios.createTaskAttempt should use MapRedUtil


 [ 
https://issues.apache.org/jira/browse/HIVE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5197:
---

Attachment: HIVE-5197.patch

Trivial patch attached.

 TestE2EScenerios.createTaskAttempt should use MapRedUtil
 

 Key: HIVE-5197
 URL: https://issues.apache.org/jira/browse/HIVE-5197
 Project: Hive
  Issue Type: Test
Reporter: Brock Noland
Priority: Minor
 Attachments: HIVE-5197.patch


 Basically we should use HCatMapRedUtil as opposed to new'ing the task attempt 
 context.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 13948: JDBC driver assumes executeStatement is synchronous

2013-09-03 Thread Henry Robinson


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13948/
---

Review request for hive.


Bugs: HIVE-5060
https://issues.apache.org/jira/browse/HIVE-5060


Repository: hive-git


Description
---

This patch adds polling after the executeStatement call. In Hive's case, this 
results in a single extra RPC, for other implementations that may have made 
executeStatement asynchronous, this allows INSERTs to run to completion before 
returning from the JDBC-level execute method.


Diffs
-

  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 982ceb8 

Diff: https://reviews.apache.org/r/13948/diff/


Testing
---

Confirmed that the INSERT problem goes away when run against Cloudera Impala, 
and that Hive requests see no observable latency penalty.


Thanks,

Henry Robinson

[jira] [Commented] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)


[ 
https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756817#comment-13756817
 ] 

Sergey Shelukhin commented on HIVE-4914:


somehow phabricator created the new review... not sure why

 filtering via partition name should be done inside metastore server 
 (implementation)
 

 Key: HIVE-4914
 URL: https://issues.apache.org/jira/browse/HIVE-4914
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-4914.01.patch, HIVE-4914.02.patch, 
 HIVE-4914.D12561.1.patch, HIVE-4914.D12645.1.patch, 
 HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, HIVE-4914.patch, 
 HIVE-4914.patch, HIVE-4914.patch


 Currently, if the filter pushdown is impossible (which is most cases), the 
 client gets all partition names from metastore, filters them, and asks for 
 partitions by names for the filtered set.
 Metastore server code should do that instead; it should check if pushdown is 
 possible and do it if so; otherwise it should do name-based filtering.
 Saves the roundtrip with all partition names from the server to client, and 
 also removes the need to have pushdown viability checking on both sides.
 NO PRECOMMIT TESTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5129) Multiple table insert fails on count(distinct)


 [ 
https://issues.apache.org/jira/browse/HIVE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5129:
---

Fix Version/s: 0.12.0

 Multiple table insert fails on count(distinct)
 --

 Key: HIVE-5129
 URL: https://issues.apache.org/jira/browse/HIVE-5129
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.12.0

 Attachments: aggrTestMultiInsertData1.txt, 
 aggrTestMultiInsertData.txt, aggrTestMultiInsert.q, HIVE-5129.1.patch.txt, 
 HIVE-5129.2.WIP.patch.txt, HIVE-5129.3.patch.txt, HIVE-5129.4.patch, 
 HIVE-5129.4.patch.txt


 Hive fails with a class cast exception on queries of the form:
 {noformat}
 from studenttab10k
 insert overwrite table multi_insert_2_1
 select name, avg(age) as avgage
 group by name
 insert overwrite table multi_insert_2_2
 select name, age, sum(gpa) as sumgpa
 group by name, age
 insert overwrite table multi_insert_2_3
 select name, count(distinct age) as distage
 group by name;
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5197) TestE2EScenerios.createTaskAttempt should use MapRedUtil


 [ 
https://issues.apache.org/jira/browse/HIVE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5197:
---

Assignee: Brock Noland
  Status: Patch Available  (was: Open)

 TestE2EScenerios.createTaskAttempt should use MapRedUtil
 

 Key: HIVE-5197
 URL: https://issues.apache.org/jira/browse/HIVE-5197
 Project: Hive
  Issue Type: Test
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Attachments: HIVE-5197.patch


 Basically we should use HCatMapRedUtil as opposed to new'ing the task attempt 
 context.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: RFC: Major HCatalog refactoring

2013-09-03 Thread Eugene Koifman

Edward,

If a testing framework is truly testing all code paths twice, there
is not much of a win there from a unit/integration tests standpoint. If the
unit tests created more coverage of the code that would be an obvious win.
I have not looked at your patch but from your description it sounds like we
are attempting to test a rename that does not sound like a win to me.

Actually this is not what we are testing.  The package name change (as well
as any changes made in 0.12) will be tested by current tests (which will
also change package name).

The goal of bringing 0.11 version of the source (and corresponding tests)
into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs,
etc (e.g. a Pig script: A = LOAD 'tablename' USING
org.apache.hcatalog.pig.HCatLoader();)  will not have to update all the
their scripts/programs when upgrading to 0.12.  Having 0.11 tests in 0.12
branch ensures that this compatibility layer continues to work while HIve
0.12 and later versions are evolving.





On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I would say a main goal of unit and integration testing is to try all code
 paths. If a testing framework is truly testing all code paths twice, there
 is not much of a win there from a unit/integration tests standpoint. If the
 unit tests created more coverage of the code that would be an obvious win.
 I have not looked at your patch but from your description it sounds like we
 are attempting to test a rename that does not sound like a win to me.

 If the current hcatalog tests run in 15 minutes, you make a change and then
 the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes
 is a TV show :)

 As for the overall hive build taking 10-15 hours. I know that :) I used to
 run them, by hand, on my laptop, because no one would share their build
 farm with me. I have heard that Hive consumes the vast majority of the
 resources of apache's build farm! I think we need to be good citizens at
 apache and attempt to make this better, not worse.

 Now that we have pre-commit builds we can work at a reasonable pace. Now
 that we have this nice pre-commit farm, I do not want to create a precedent
 that now we can go nuts, and start down the same slippery slope.




 On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman ekoif...@hortonworks.com
 wrote:

  Current (sequential) run of all hive/hcat unit tests takes 10-15 hours.
  Is
  another 20-30 minutes that significant?
 
  I'm generally wary of unit tests that are not run continuously and
  automatically.  It delays the detection of problems and then what was
  probably an obvious fix at the time the change was made becomes a long
  debugging session (often by someone other than whose change broke
 things).
   I think this is especially true given how many people are contributing
 to
  hive.
 
 
 
  On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote:
 
   OK that should be fine.  Though I would echo Edwards sentiment about
   adding so much test time. Do these tests have to run each time? Does
   it make sense to have an test target such as test-all-hcatalog and
   then have then run them periodically manually, especially before
   releases?
  
   On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
   ekoif...@hortonworks.com wrote:
These will be new (I.e. 0.11 version) test classes which will be in
 the
   old
org.apache.hcatalog package.  How does that affect the new framework?
   
On Saturday, August 31, 2013, Brock Noland wrote:
   
Will these be new Java class files or new test methods to existing
classes?  I am just curious as to how this will play into the
distributed testing framework.
   
On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
ekoif...@hortonworks.com wrote:
 not quite double but close  (on my Mac that means it will go up
 from
   35
 minutes to 55-60) so in greater scheme of things it should be
   negligible



 On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo 
   edlinuxg...@gmail.com
wrote:

 By coverage do you mean to say that:

  Thus, the published HCatalog JARs will contain both packages
 and
   the
unit
  tests will cover both versions of the API.

 We are going to double the time of unit tests for this module?


 On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman 
ekoif...@hortonworks.com
 wrote:

  This will change every file under hcatalog so it has to happen
   before
the
  branching.  Most likely at the beginning of next week.
 
  Thanks
 
 
  On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman 
 ekoif...@hortonworks.com
  wrote:
 
   Hi,
  
  
   Here is the plan for refactoring HCatalog as was agreed to
 when
   it
was
   merged into Hive during.  HIVE-4869 is the umbrella bug for
  this
work.
   The
   changes are complex and touch every single

[jira] [Updated] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)


 [ 
https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4914:
--

Attachment: HIVE-4914.D12561.2.patch

sershe updated the revision HIVE-4914 [jira] filtering via partition name 
should be done inside metastore server (implementation).

  Try to attach the patch manually

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12561

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12561?vs=39117id=39441#toc

MANIPHEST TASKS
  https://reviews.facebook.net/T63

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  metastore/if/hive_metastore.thrift
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  
metastore/src/java/org/apache/hadoop/hive/metastore/PartitionExpressionProxy.java
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
  metastore/src/test/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java

To: JIRA, ashutoshc, sershe


 filtering via partition name should be done inside metastore server 
 (implementation)
 

 Key: HIVE-4914
 URL: https://issues.apache.org/jira/browse/HIVE-4914
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-4914.01.patch, HIVE-4914.02.patch, 
 HIVE-4914.D12561.1.patch, HIVE-4914.D12561.2.patch, HIVE-4914.D12645.1.patch, 
 HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, HIVE-4914.patch, 
 HIVE-4914.patch, HIVE-4914.patch


 Currently, if the filter pushdown is impossible (which is most cases), the 
 client gets all partition names from metastore, filters them, and asks for 
 partitions by names for the filtered set.
 Metastore server code should do that instead; it should check if pushdown is 
 possible and do it if so; otherwise it should do name-based filtering.
 Saves the roundtrip with all partition names from the server to client, and 
 also removes the need to have pushdown viability checking on both sides.
 NO PRECOMMIT TESTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4441) [HCatalog] WebHCat does not honor user home directory


[ 
https://issues.apache.org/jira/browse/HIVE-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756866#comment-13756866
 ] 

Daniel Dai commented on HIVE-4441:
--

[~thejas]
DistributedFileSystem.getHomeDirectory() has annoying makeQualified():
{code}
return new Path(/user/ + dfs.ugi.getShortUserName()).makeQualified(this);
{code}
I don't find a hdfs method which can give us the simple form without qualified

[~ekoifman]
For s3 file system, if user specify statusdir=s3://myoutput, user mean for 
absolute path. However, s3://myoutput is a relative path as per hdfs 
(isAbsolute()==false). But we cannot convert it into s3://user//myoutput 
since s3://user does not belong to the user. So here we skip s3/asv filesystem.

e2e tests is included in HIVE-5078 (eg:Pig_9, which we check the location of 
stdout/stderr/syslog file). Sorry for the confusion.

 [HCatalog] WebHCat does not honor user home directory
 -

 Key: HIVE-4441
 URL: https://issues.apache.org/jira/browse/HIVE-4441
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4441-1.patch, HIVE-4441-2.patch, HIVE-4441-3.patch


 If I submit a job as user A and I specify statusdir as a relative path, I 
 would expect results to be stored in the folder relative to the user A's home 
 folder.
 For example, if I run:
 {code}curl -s -d user.name=hdinsightuser -d execute=show+tables; -d 
 statusdir=pokes.output 'http://localhost:50111/templeton/v1/hive'{code}
 I get the results under:
 {code}/user/hdp/pokes.output{code}
 And I expect them to be under:
 {code}/user/hdinsightuser/pokes.output{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns

2013-09-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756867#comment-13756867
 ] 

Hive QA commented on HIVE-5149:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12601170/HIVE-5149.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 2905 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/592/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/592/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.


 [ 
https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4959:
---

   Resolution: Fixed
Fix Version/s: vectorization-branch
   Status: Resolved  (was: Patch Available)

Committed this to trunk. Thanks, Jitendra! 

Instead of maintaining a static list of vectorizable operator and expressions 
in Vectorizer class, better way to do this is via adding an annotation on UDF 
and using that. We should do that in a follow-up jira.

 Vectorized plan generation should be added as an optimization transform.
 

 Key: HIVE-4959
 URL: https://issues.apache.org/jira/browse/HIVE-4959
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: vectorization-branch

 Attachments: HIVE-4959.1.patch, HIVE-4959.2.patch, HIVE-4959.3.patch


 Currently the query plan is vectorized at the query run time in the map task. 
 It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5129) Multiple table insert fails on count(distinct)


[ 
https://issues.apache.org/jira/browse/HIVE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756916#comment-13756916
 ] 

Hudson commented on HIVE-5129:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #401 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/401/])
HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via 
Harish Butani) (rhbutani: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519764)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q
* /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out


 Multiple table insert fails on count(distinct)
 --

 Key: HIVE-5129
 URL: https://issues.apache.org/jira/browse/HIVE-5129
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.12.0

 Attachments: aggrTestMultiInsertData1.txt, 
 aggrTestMultiInsertData.txt, aggrTestMultiInsert.q, HIVE-5129.1.patch.txt, 
 HIVE-5129.2.WIP.patch.txt, HIVE-5129.3.patch.txt, HIVE-5129.4.patch, 
 HIVE-5129.4.patch.txt


 Hive fails with a class cast exception on queries of the form:
 {noformat}
 from studenttab10k
 insert overwrite table multi_insert_2_1
 select name, avg(age) as avgage
 group by name
 insert overwrite table multi_insert_2_2
 select name, age, sum(gpa) as sumgpa
 group by name, age
 insert overwrite table multi_insert_2_3
 select name, count(distinct age) as distage
 group by name;
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.


[ 
https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756886#comment-13756886
 ] 

Ashutosh Chauhan edited comment on HIVE-4959 at 9/3/13 7:05 PM:


Committed this to branch. Thanks, Jitendra! 

Instead of maintaining a static list of vectorizable operator and expressions 
in Vectorizer class, better way to do this is via adding an annotation on UDF 
and using that. We should do that in a follow-up jira.

  was (Author: ashutoshc):
Committed this to trunk. Thanks, Jitendra! 

Instead of maintaining a static list of vectorizable operator and expressions 
in Vectorizer class, better way to do this is via adding an annotation on UDF 
and using that. We should do that in a follow-up jira.
  
 Vectorized plan generation should be added as an optimization transform.
 

 Key: HIVE-4959
 URL: https://issues.apache.org/jira/browse/HIVE-4959
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: vectorization-branch

 Attachments: HIVE-4959.1.patch, HIVE-4959.2.patch, HIVE-4959.3.patch


 Currently the query plan is vectorized at the query run time in the map task. 
 It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.


 [ 
https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5152:
---

   Resolution: Fixed
Fix Version/s: vectorization-branch
   Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Jitendra!

 Vector operators should inherit from non-vector operators for code re-use.
 --

 Key: HIVE-5152
 URL: https://issues.apache.org/jira/browse/HIVE-5152
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: vectorization-branch

 Attachments: HIVE-5152.1.patch


 In many cases vectorized operators could share code from non-vector operators 
 by inheriting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.


[ 
https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756920#comment-13756920
 ] 

Ashutosh Chauhan commented on HIVE-5152:


For completeness, shall we make VectorFileSinkOp extend from FileSinkOp. Or is 
that not worthwhile?

 Vector operators should inherit from non-vector operators for code re-use.
 --

 Key: HIVE-5152
 URL: https://issues.apache.org/jira/browse/HIVE-5152
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5152.1.patch


 In many cases vectorized operators could share code from non-vector operators 
 by inheriting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5131) JDBC client's hive variables are not passed to HS2


[ 
https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756933#comment-13756933
 ] 

Thejas M Nair commented on HIVE-5131:
-

The changes look good. For the unit test, instead of making change to the url 
used by all tests in TestBeeLineWithArgs, can you change it so that the url can 
be customized per test ? 
Maybe something like this - 
{code}
 final static String BASE_JDBC_URL = BeeLine.BEELINE_DEFAULT_JDBC_URL + 
localhost:1
// set JDBC_URL to something else in test case, if it needs to be customized
 String JDBC_URL = BASE_JDBC_URL;

// Use JDBC_URL to connect
{code}

 JDBC client's hive variables are not passed to HS2
 --

 Key: HIVE-5131
 URL: https://issues.apache.org/jira/browse/HIVE-5131
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-5131.patch, HIVE-5131.patch


 Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC 
 clients suffer the same problem. This was identified in HIVE-4568. I decided 
 it might be better to separate issue from a different issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5131) JDBC client's hive variables are not passed to HS2


 [ 
https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5131:


Status: Open  (was: Patch Available)

 JDBC client's hive variables are not passed to HS2
 --

 Key: HIVE-5131
 URL: https://issues.apache.org/jira/browse/HIVE-5131
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-5131.patch, HIVE-5131.patch


 Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC 
 clients suffer the same problem. This was identified in HIVE-4568. I decided 
 it might be better to separate issue from a different issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: RFC: Major HCatalog refactoring

I understand.

Can we do something like this?

oldpackage.HCatologLoader extends newpackage.HCatlogloader { }

If we do something like this we don't need to test both classes, it is safe
to assume they both do the same thing.

I understand that we do not want users to have to specify a new class name,
but 15 minutes of unit tests around a re-name is overkill.


On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman ekoif...@hortonworks.comwrote:

 Edward,

 If a testing framework is truly testing all code paths twice, there
 is not much of a win there from a unit/integration tests standpoint. If the
 unit tests created more coverage of the code that would be an obvious win.
 I have not looked at your patch but from your description it sounds like we
 are attempting to test a rename that does not sound like a win to me.

 Actually this is not what we are testing.  The package name change (as well
 as any changes made in 0.12) will be tested by current tests (which will
 also change package name).

 The goal of bringing 0.11 version of the source (and corresponding tests)
 into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs,
 etc (e.g. a Pig script: A = LOAD 'tablename' USING
 org.apache.hcatalog.pig.HCatLoader();)  will not have to update all the
 their scripts/programs when upgrading to 0.12.  Having 0.11 tests in 0.12
 branch ensures that this compatibility layer continues to work while HIve
 0.12 and later versions are evolving.





 On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  I would say a main goal of unit and integration testing is to try all
 code
  paths. If a testing framework is truly testing all code paths twice,
 there
  is not much of a win there from a unit/integration tests standpoint. If
 the
  unit tests created more coverage of the code that would be an obvious
 win.
  I have not looked at your patch but from your description it sounds like
 we
  are attempting to test a rename that does not sound like a win to me.
 
  If the current hcatalog tests run in 15 minutes, you make a change and
 then
  the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes
  is a TV show :)
 
  As for the overall hive build taking 10-15 hours. I know that :) I used
 to
  run them, by hand, on my laptop, because no one would share their build
  farm with me. I have heard that Hive consumes the vast majority of the
  resources of apache's build farm! I think we need to be good citizens at
  apache and attempt to make this better, not worse.
 
  Now that we have pre-commit builds we can work at a reasonable pace. Now
  that we have this nice pre-commit farm, I do not want to create a
 precedent
  that now we can go nuts, and start down the same slippery slope.
 
 
 
 
  On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman 
 ekoif...@hortonworks.com
  wrote:
 
   Current (sequential) run of all hive/hcat unit tests takes 10-15 hours.
   Is
   another 20-30 minutes that significant?
  
   I'm generally wary of unit tests that are not run continuously and
   automatically.  It delays the detection of problems and then what was
   probably an obvious fix at the time the change was made becomes a long
   debugging session (often by someone other than whose change broke
  things).
I think this is especially true given how many people are contributing
  to
   hive.
  
  
  
   On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com
 wrote:
  
OK that should be fine.  Though I would echo Edwards sentiment about
adding so much test time. Do these tests have to run each time? Does
it make sense to have an test target such as test-all-hcatalog and
then have then run them periodically manually, especially before
releases?
   
On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
ekoif...@hortonworks.com wrote:
 These will be new (I.e. 0.11 version) test classes which will be in
  the
old
 org.apache.hcatalog package.  How does that affect the new
 framework?

 On Saturday, August 31, 2013, Brock Noland wrote:

 Will these be new Java class files or new test methods to existing
 classes?  I am just curious as to how this will play into the
 distributed testing framework.

 On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
 ekoif...@hortonworks.com wrote:
  not quite double but close  (on my Mac that means it will go up
  from
35
  minutes to 55-60) so in greater scheme of things it should be
negligible
 
 
 
  On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo 
edlinuxg...@gmail.com
 wrote:
 
  By coverage do you mean to say that:
 
   Thus, the published HCatalog JARs will contain both packages
  and
the
 unit
   tests will cover both versions of the API.
 
  We are going to double the time of unit tests for this module?
 
 
  On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman

[jira] [Commented] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.


[ 
https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756921#comment-13756921
 ] 

Ashutosh Chauhan commented on HIVE-5152:


Oh sorry missed that one. VectorFS already extends from FS.

 Vector operators should inherit from non-vector operators for code re-use.
 --

 Key: HIVE-5152
 URL: https://issues.apache.org/jira/browse/HIVE-5152
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5152.1.patch


 In many cases vectorized operators could share code from non-vector operators 
 by inheriting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5131) JDBC client's hive variables are not passed to HS2


[ 
https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756936#comment-13756936
 ] 

Thejas M Nair commented on HIVE-5131:
-

While you are at it, can you also make a minor change to javadoc of 
TestBeeLineWithArgs.testScriptFile ?
Change  @param expecttedPattern Text to look for in command output to  
@param expecttedPattern Text to look for in command output/error

 JDBC client's hive variables are not passed to HS2
 --

 Key: HIVE-5131
 URL: https://issues.apache.org/jira/browse/HIVE-5131
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-5131.patch, HIVE-5131.patch


 Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC 
 clients suffer the same problem. This was identified in HIVE-4568. I decided 
 it might be better to separate issue from a different issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4441) [HCatalog] WebHCat does not honor user home directory


[ 
https://issues.apache.org/jira/browse/HIVE-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756894#comment-13756894
 ] 

Eugene Koifman commented on HIVE-4441:
--

I think this needs to be in the JavaDoc and at least a debug level log 
statement needs to be added to indicate where the data is actually written.

 [HCatalog] WebHCat does not honor user home directory
 -

 Key: HIVE-4441
 URL: https://issues.apache.org/jira/browse/HIVE-4441
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4441-1.patch, HIVE-4441-2.patch, HIVE-4441-3.patch


 If I submit a job as user A and I specify statusdir as a relative path, I 
 would expect results to be stored in the folder relative to the user A's home 
 folder.
 For example, if I run:
 {code}curl -s -d user.name=hdinsightuser -d execute=show+tables; -d 
 statusdir=pokes.output 'http://localhost:50111/templeton/v1/hive'{code}
 I get the results under:
 {code}/user/hdp/pokes.output{code}
 And I expect them to be under:
 {code}/user/hdinsightuser/pokes.output{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5113) webhcat should allow configuring memory used by templetoncontroller map job in hadoop2


[ 
https://issues.apache.org/jira/browse/HIVE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756941#comment-13756941
 ] 

Eugene Koifman commented on HIVE-5113:
--

[~thejas] Some comments:
1. can templeton.controller.map.mem be named mapreduce.map.memory.mb, i.e. the 
same as the Hadoop prop. I think it is easier to follow if prop are not 
renamed.  Or at least webhcat(templeton).mapreduce.map.memory.mb if you are 
worried about namespace collisions.  (same for other new props)
2. The description in webhcat-default.xml:  Could it say that this is a Hadoop 
option and is passed directly/as is to Templeton Controller MR job.  (In some 
cases default is specified as '512' and in others '-Xmx300m').  It would tell 
users where to go get more doc that explains what these props do.
3. templeton.controller.mr.child.opts is defined in the code but now 
webhcat-default.xml.  Is that intentional?
4. public static final String HADOOP_MAP_JAVA_OPTS = mapreduce.map.java.opts;
public static final String HADOOP_MAP_MEMORY = mapreduce.map.memory.mb;
public static final String HADOOP_AM_MEMORY = 
yarn.app.mapreduce.am.resource.mb;
public static final String HADOOP_AM_JAVA_OPTS = 
yarn.app.mapreduce.am.command-ops;
Are there symbolic constants in Hadoop code base for these?  Can they be used 
here?



 webhcat should allow configuring memory used by templetoncontroller map job 
 in hadoop2
 --

 Key: HIVE-5113
 URL: https://issues.apache.org/jira/browse/HIVE-5113
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5113.1.patch


 Webhcat should allow the following hadoop2 config parameters to be set the 
 templetoncontroller map-only job that actually runs the pig/hive/mr command.
 mapreduce.map.memory.mb
 yarn.app.mapreduce.am.resource.mb
 yarn.app.mapreduce.am.command-opts
 It should also be set to reasonable defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2013-09-03 Thread Xuefu Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756943#comment-13756943
]

Xuefu Zhang commented on HIVE-3976:
---

[~jdere] Thanks for posting your code regarding precision/scale, and your
comments about related UDFs. There seems a lot of work, but we hope the outcome
is worth the effort. It's good that you have gained insight with your
char/varchar work. It will be valuable.

[~hagleitn] Thanks for sharing your thoughts. I agree that this is complex
enough to have a spec, with which, the issue may close quicker and easier due
to a large community. The questions you posted are valid and yet to be
answered. Existing decimal feature seems incomplete and non-standard in many
ways. With this task, we can hope to put it in a good shape. In principal, I
think we should follow standard if available, and follow some implementation or
have hive's own implementation when standard is not defined. Doing that seems
making it unavoidable to break backward compatibility. But how much we can
break. For instance, can we say that a decimal without precision and scale
specified defaults to (10, 0) (as mysql does) rather than the current (38, ?).
It's great if we can redefine everything and do it right, once for all.

Support specifying scale and precision with Hive decimal type
-

Key: HIVE-3976
URL: https://issues.apache.org/jira/browse/HIVE-3976
Project: Hive
Issue Type: Improvement
Components: Query Processor, Types
Reporter: Mark Grover
Assignee: Xuefu Zhang
Attachments: remove_prec_scale.diff

HIVE-2693 introduced support for Decimal datatype in Hive. However, the
current implementation has unlimited precision and provides no way to specify
precision and scale when creating the table.
For example, MySQL allows users to specify scale and precision of the decimal
datatype when creating the table:
{code}
CREATE TABLE numbers (a DECIMAL(20,2));
{code}
Hive should support something similar too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: RFC: Major HCatalog refactoring

You may have already said this but remind me again. If we go with this
approach, how long until we retired the duplicated code and insist end
users use the new name? 1 release?

A similar debate is likely why the hive classes are still packaged as
org.apache.hadoop.hive, rather then org.apache.hive.


On Tue, Sep 3, 2013 at 2:54 PM, Eugene Koifman ekoif...@hortonworks.comwrote:

 We explored the idea you suggest and given the number of APIs (and their
 transitive closure) it would would be very difficult and the result would
 be fragile.  So unfortunately that is not possible.

 For example, oldpackage.A has a  method foo() that returns oldpackage.B.
  You could create
 newpackage.A extends oldpackage.A {
  @Override
  newpacage.B foo() {
  }
 }

 which works because of covariant return type, but the implementation of
 foo() becomes problematic because it itself uses other classes.

 On Tue, Sep 3, 2013 at 11:41 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  I understand.
 
  Can we do something like this?
 
  oldpackage.HCatologLoader extends newpackage.HCatlogloader { }
 
  If we do something like this we don't need to test both classes, it is
 safe
  to assume they both do the same thing.
 
  I understand that we do not want users to have to specify a new class
 name,
  but 15 minutes of unit tests around a re-name is overkill.
 
 
  On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman ekoif...@hortonworks.com
  wrote:
 
   Edward,
  
   If a testing framework is truly testing all code paths twice, there
   is not much of a win there from a unit/integration tests standpoint. If
  the
   unit tests created more coverage of the code that would be an obvious
  win.
   I have not looked at your patch but from your description it sounds
 like
  we
   are attempting to test a rename that does not sound like a win to me.
  
   Actually this is not what we are testing.  The package name change (as
  well
   as any changes made in 0.12) will be tested by current tests (which
 will
   also change package name).
  
   The goal of bringing 0.11 version of the source (and corresponding
 tests)
   into 0.12 is to ensure that users who use HCatalog from scripts/MR
 jobs,
   etc (e.g. a Pig script: A = LOAD 'tablename' USING
   org.apache.hcatalog.pig.HCatLoader();)  will not have to update all the
   their scripts/programs when upgrading to 0.12.  Having 0.11 tests in
 0.12
   branch ensures that this compatibility layer continues to work while
 HIve
   0.12 and later versions are evolving.
  
  
  
  
  
   On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo 
 edlinuxg...@gmail.com
   wrote:
  
I would say a main goal of unit and integration testing is to try all
   code
paths. If a testing framework is truly testing all code paths twice,
   there
is not much of a win there from a unit/integration tests standpoint.
 If
   the
unit tests created more coverage of the code that would be an obvious
   win.
I have not looked at your patch but from your description it sounds
  like
   we
are attempting to test a rename that does not sound like a win to me.
   
If the current hcatalog tests run in 15 minutes, you make a change
 and
   then
the run is 30 minutes. 15 minutes is a nice long coffee break, 30
  minutes
is a TV show :)
   
As for the overall hive build taking 10-15 hours. I know that :) I
 used
   to
run them, by hand, on my laptop, because no one would share their
 build
farm with me. I have heard that Hive consumes the vast majority of
 the
resources of apache's build farm! I think we need to be good citizens
  at
apache and attempt to make this better, not worse.
   
Now that we have pre-commit builds we can work at a reasonable pace.
  Now
that we have this nice pre-commit farm, I do not want to create a
   precedent
that now we can go nuts, and start down the same slippery slope.
   
   
   
   
On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman 
   ekoif...@hortonworks.com
wrote:
   
 Current (sequential) run of all hive/hcat unit tests takes 10-15
  hours.
 Is
 another 20-30 minutes that significant?

 I'm generally wary of unit tests that are not run continuously and
 automatically.  It delays the detection of problems and then what
 was
 probably an obvious fix at the time the change was made becomes a
  long
 debugging session (often by someone other than whose change broke
things).
  I think this is especially true given how many people are
  contributing
to
 hive.



 On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com
   wrote:

  OK that should be fine.  Though I would echo Edwards sentiment
  about
  adding so much test time. Do these tests have to run each time?
  Does
  it make sense to have an test target such as test-all-hcatalog
 and
  then have then run them periodically manually, especially before
  releases?
 
  On

Re: RFC: Major HCatalog refactoring

There is already a few things in hcat that I would argue are things we do
not need to test. Example:

testAMQListener

I would say it is a bit out of scope to have a component like this. Could
we mock the message queue, do we need these dependencies in project?

I can not even fathom how long the tests will run post vectorization, post
tez? Maybe I am the only one who worries about these things.


On Tue, Sep 3, 2013 at 2:41 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I understand.

 Can we do something like this?

 oldpackage.HCatologLoader extends newpackage.HCatlogloader { }

 If we do something like this we don't need to test both classes, it is
 safe to assume they both do the same thing.

 I understand that we do not want users to have to specify a new class
 name, but 15 minutes of unit tests around a re-name is overkill.


 On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman 
 ekoif...@hortonworks.comwrote:

 Edward,

 If a testing framework is truly testing all code paths twice, there
 is not much of a win there from a unit/integration tests standpoint. If
 the
 unit tests created more coverage of the code that would be an obvious win.
 I have not looked at your patch but from your description it sounds like
 we
 are attempting to test a rename that does not sound like a win to me.

 Actually this is not what we are testing.  The package name change (as
 well
 as any changes made in 0.12) will be tested by current tests (which will
 also change package name).

 The goal of bringing 0.11 version of the source (and corresponding tests)
 into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs,
 etc (e.g. a Pig script: A = LOAD 'tablename' USING
 org.apache.hcatalog.pig.HCatLoader();)  will not have to update all the
 their scripts/programs when upgrading to 0.12.  Having 0.11 tests in 0.12
 branch ensures that this compatibility layer continues to work while HIve
 0.12 and later versions are evolving.





 On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  I would say a main goal of unit and integration testing is to try all
 code
  paths. If a testing framework is truly testing all code paths twice,
 there
  is not much of a win there from a unit/integration tests standpoint. If
 the
  unit tests created more coverage of the code that would be an obvious
 win.
  I have not looked at your patch but from your description it sounds
 like we
  are attempting to test a rename that does not sound like a win to me.
 
  If the current hcatalog tests run in 15 minutes, you make a change and
 then
  the run is 30 minutes. 15 minutes is a nice long coffee break, 30
 minutes
  is a TV show :)
 
  As for the overall hive build taking 10-15 hours. I know that :) I used
 to
  run them, by hand, on my laptop, because no one would share their build
  farm with me. I have heard that Hive consumes the vast majority of the
  resources of apache's build farm! I think we need to be good citizens at
  apache and attempt to make this better, not worse.
 
  Now that we have pre-commit builds we can work at a reasonable pace. Now
  that we have this nice pre-commit farm, I do not want to create a
 precedent
  that now we can go nuts, and start down the same slippery slope.
 
 
 
 
  On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman 
 ekoif...@hortonworks.com
  wrote:
 
   Current (sequential) run of all hive/hcat unit tests takes 10-15
 hours.
   Is
   another 20-30 minutes that significant?
  
   I'm generally wary of unit tests that are not run continuously and
   automatically.  It delays the detection of problems and then what was
   probably an obvious fix at the time the change was made becomes a long
   debugging session (often by someone other than whose change broke
  things).
I think this is especially true given how many people are
 contributing
  to
   hive.
  
  
  
   On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com
 wrote:
  
OK that should be fine.  Though I would echo Edwards sentiment about
adding so much test time. Do these tests have to run each time? Does
it make sense to have an test target such as test-all-hcatalog and
then have then run them periodically manually, especially before
releases?
   
On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
ekoif...@hortonworks.com wrote:
 These will be new (I.e. 0.11 version) test classes which will be
 in
  the
old
 org.apache.hcatalog package.  How does that affect the new
 framework?

 On Saturday, August 31, 2013, Brock Noland wrote:

 Will these be new Java class files or new test methods to
 existing
 classes?  I am just curious as to how this will play into the
 distributed testing framework.

 On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
 ekoif...@hortonworks.com wrote:
  not quite double but close  (on my Mac that means it will go up
  from
35
  minutes to 55-60) so in greater scheme of things it

Re: RFC: Major HCatalog refactoring

2013-09-03 Thread Eugene Koifman

We explored the idea you suggest and given the number of APIs (and their
transitive closure) it would would be very difficult and the result would
be fragile.  So unfortunately that is not possible.

For example, oldpackage.A has a  method foo() that returns oldpackage.B.
 You could create
newpackage.A extends oldpackage.A {
 @Override
 newpacage.B foo() {
 }
}

which works because of covariant return type, but the implementation of
foo() becomes problematic because it itself uses other classes.

On Tue, Sep 3, 2013 at 11:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I understand.

 Can we do something like this?

 oldpackage.HCatologLoader extends newpackage.HCatlogloader { }

 If we do something like this we don't need to test both classes, it is safe
 to assume they both do the same thing.

 I understand that we do not want users to have to specify a new class name,
 but 15 minutes of unit tests around a re-name is overkill.


 On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman ekoif...@hortonworks.com
 wrote:

  Edward,
 
  If a testing framework is truly testing all code paths twice, there
  is not much of a win there from a unit/integration tests standpoint. If
 the
  unit tests created more coverage of the code that would be an obvious
 win.
  I have not looked at your patch but from your description it sounds like
 we
  are attempting to test a rename that does not sound like a win to me.
 
  Actually this is not what we are testing.  The package name change (as
 well
  as any changes made in 0.12) will be tested by current tests (which will
  also change package name).
 
  The goal of bringing 0.11 version of the source (and corresponding tests)
  into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs,
  etc (e.g. a Pig script: A = LOAD 'tablename' USING
  org.apache.hcatalog.pig.HCatLoader();)  will not have to update all the
  their scripts/programs when upgrading to 0.12.  Having 0.11 tests in 0.12
  branch ensures that this compatibility layer continues to work while HIve
  0.12 and later versions are evolving.
 
 
 
 
 
  On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   I would say a main goal of unit and integration testing is to try all
  code
   paths. If a testing framework is truly testing all code paths twice,
  there
   is not much of a win there from a unit/integration tests standpoint. If
  the
   unit tests created more coverage of the code that would be an obvious
  win.
   I have not looked at your patch but from your description it sounds
 like
  we
   are attempting to test a rename that does not sound like a win to me.
  
   If the current hcatalog tests run in 15 minutes, you make a change and
  then
   the run is 30 minutes. 15 minutes is a nice long coffee break, 30
 minutes
   is a TV show :)
  
   As for the overall hive build taking 10-15 hours. I know that :) I used
  to
   run them, by hand, on my laptop, because no one would share their build
   farm with me. I have heard that Hive consumes the vast majority of the
   resources of apache's build farm! I think we need to be good citizens
 at
   apache and attempt to make this better, not worse.
  
   Now that we have pre-commit builds we can work at a reasonable pace.
 Now
   that we have this nice pre-commit farm, I do not want to create a
  precedent
   that now we can go nuts, and start down the same slippery slope.
  
  
  
  
   On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman 
  ekoif...@hortonworks.com
   wrote:
  
Current (sequential) run of all hive/hcat unit tests takes 10-15
 hours.
Is
another 20-30 minutes that significant?
   
I'm generally wary of unit tests that are not run continuously and
automatically.  It delays the detection of problems and then what was
probably an obvious fix at the time the change was made becomes a
 long
debugging session (often by someone other than whose change broke
   things).
 I think this is especially true given how many people are
 contributing
   to
hive.
   
   
   
On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com
  wrote:
   
 OK that should be fine.  Though I would echo Edwards sentiment
 about
 adding so much test time. Do these tests have to run each time?
 Does
 it make sense to have an test target such as test-all-hcatalog and
 then have then run them periodically manually, especially before
 releases?

 On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman
 ekoif...@hortonworks.com wrote:
  These will be new (I.e. 0.11 version) test classes which will be
 in
   the
 old
  org.apache.hcatalog package.  How does that affect the new
  framework?
 
  On Saturday, August 31, 2013, Brock Noland wrote:
 
  Will these be new Java class files or new test methods to
 existing
  classes?  I am just curious as to how this will play into the
  distributed testing framework.
 
  On Sat, Aug 31, 2013

[jira] [Commented] (HIVE-5129) Multiple table insert fails on count(distinct)

2013-09-03 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756812#comment-13756812
 ] 

Harish Butani commented on HIVE-5129:
-

Committed to trunk. Thanks, Vikram!

 Multiple table insert fails on count(distinct)
 --

 Key: HIVE-5129
 URL: https://issues.apache.org/jira/browse/HIVE-5129
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: aggrTestMultiInsertData1.txt, 
 aggrTestMultiInsertData.txt, aggrTestMultiInsert.q, HIVE-5129.1.patch.txt, 
 HIVE-5129.2.WIP.patch.txt, HIVE-5129.3.patch.txt, HIVE-5129.4.patch, 
 HIVE-5129.4.patch.txt


 Hive fails with a class cast exception on queries of the form:
 {noformat}
 from studenttab10k
 insert overwrite table multi_insert_2_1
 select name, avg(age) as avgage
 group by name
 insert overwrite table multi_insert_2_2
 select name, age, sum(gpa) as sumgpa
 group by name, age
 insert overwrite table multi_insert_2_3
 select name, count(distinct age) as distage
 group by name;
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5149:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Yin!

groupby2.q failed because .q.out needs update, which I did. TestMTQueries 
didn't fail for me and looks flaky. That test seems inordinately long time (30 
mins) to execute and seems like spend all its time waiting to release locks in 
ZK. There is some threading issue going on there. Needs some investigation.

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4586) [HCatalog] WebHCat should return 404 error for undefined resource


 [ 
https://issues.apache.org/jira/browse/HIVE-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved HIVE-4586.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

Patch committed to trunk.

 [HCatalog] WebHCat should return 404 error for undefined resource
 -

 Key: HIVE-4586
 URL: https://issues.apache.org/jira/browse/HIVE-4586
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4586-1.patch, HIVE-4586-2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)

Eugene Koifman created HIVE-5198:


 Summary: WebHCat returns exitcode 143 (w/o an explanation)
 Key: HIVE-5198
 URL: https://issues.apache.org/jira/browse/HIVE-5198
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.11.0
Reporter: Eugene Koifman


Filing this bug mostly to help anyone trying to decipher 143 error code which 
does not appear in the source code.  In 0.12 error reporting was improved and 
this reports a stacktrace.

This error code means that Metastore client could not connect to the metastore.
This is likely a config issue with hive.metastore.uris not being set.

The message might look like this:
{statement:use default; show table extended like xyz;,error:unable to 
show table: xyz,exec:{stdout:,stderr:,exitcode:143}} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns


[ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756969#comment-13756969
 ] 

Yin Huai commented on HIVE-5149:


groupby2.q is also used in TestMTQueries. Probably the failure of TestMTQueries 
was also caused groupby2.

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5186) Remove JoinReducerProc from ReduceSinkDeDuplication


[ 
https://issues.apache.org/jira/browse/HIVE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756973#comment-13756973
 ] 

Ashutosh Chauhan commented on HIVE-5186:


I am not sure we want this. Wherever possible RSDeDup should compact the query 
plan (since it avoids adding new operators in piepline). Only cases which can 
not be handled by RSDeDup, Correlation Optimizer should kick in. 

In this patch, it seems like you are removing a case which can be handled by 
RSDeDup.

 Remove JoinReducerProc from ReduceSinkDeDuplication
 ---

 Key: HIVE-5186
 URL: https://issues.apache.org/jira/browse/HIVE-5186
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Minor
 Attachments: HIVE-5186.1.patch.txt


 Correlation Optimizer will take care patterns involving JoinOperator. We can 
 remove JoinReducerProc from ReduceSinkDeDuplication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns


[ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756975#comment-13756975
 ] 

Ashutosh Chauhan commented on HIVE-5149:


Right. Yeah, likely that was the reason for TestMTQueries failures.

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5186) Remove JoinReducerProc from ReduceSinkDeDuplication


 [ 
https://issues.apache.org/jira/browse/HIVE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5186:
---

Status: Open  (was: Patch Available)

 Remove JoinReducerProc from ReduceSinkDeDuplication
 ---

 Key: HIVE-5186
 URL: https://issues.apache.org/jira/browse/HIVE-5186
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Minor
 Attachments: HIVE-5186.1.patch.txt


 Correlation Optimizer will take care patterns involving JoinOperator. We can 
 remove JoinReducerProc from ReduceSinkDeDuplication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4

2013-09-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757012#comment-13757012
 ] 

Hive QA commented on HIVE-5112:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12601174/HIVE-5112.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2906 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/593/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/593/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Upgrade protobuf to 2.5 from 2.4
 

 Key: HIVE-5112
 URL: https://issues.apache.org/jira/browse/HIVE-5112
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Owen O'Malley
 Attachments: HIVE-5112.2.patch, HIVE-5112.D12429.1.patch


 Hadoop and Hbase have both upgraded protobuf. We should as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5113) webhcat should allow configuring memory used by templetoncontroller map job in hadoop2


[ 
https://issues.apache.org/jira/browse/HIVE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756978#comment-13756978
 ] 

Thejas M Nair commented on HIVE-5113:
-

canceling patch while comments are addressed.

 webhcat should allow configuring memory used by templetoncontroller map job 
 in hadoop2
 --

 Key: HIVE-5113
 URL: https://issues.apache.org/jira/browse/HIVE-5113
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5113.1.patch


 Webhcat should allow the following hadoop2 config parameters to be set the 
 templetoncontroller map-only job that actually runs the pig/hive/mr command.
 mapreduce.map.memory.mb
 yarn.app.mapreduce.am.resource.mb
 yarn.app.mapreduce.am.command-opts
 It should also be set to reasonable defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call


 [ 
https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4442:
-

Attachment: HIVE-4443-3.patch

[~ekoifman]
That's fine, I don't have a preference with API. I also borrow the code from 
other parts of Templeton. Reattach patch with API change and resync with trunk.

 [HCatalog] WebHCat should not override user.name parameter for Queue call
 -

 Key: HIVE-4442
 URL: https://issues.apache.org/jira/browse/HIVE-4442
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4442-1.patch, HIVE-4442-2.patch, HIVE-4443-3.patch


 Currently templeton for the Queue call uses the user.name to filter the 
 results of the call in addition to the default security.
 Ideally the filter is an optional parameter to the call independent of the 
 security check.
 I would suggest a parameter in addition to GET queue (jobs) give you all the 
 jobs a user have permission:
 GET queue?showall=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5131) JDBC client's hive variables are not passed to HS2

2013-09-03 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5131:
--

Attachment: HIVE-5131.1.patch

Patch is updated based on review comments.

 JDBC client's hive variables are not passed to HS2
 --

 Key: HIVE-5131
 URL: https://issues.apache.org/jira/browse/HIVE-5131
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-5131.1.patch, HIVE-5131.patch, HIVE-5131.patch


 Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC 
 clients suffer the same problem. This was identified in HIVE-4568. I decided 
 it might be better to separate issue from a different issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5187) Enhance explain to indicate vectorized execution of operators.


 [ 
https://issues.apache.org/jira/browse/HIVE-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5187:
---

   Resolution: Fixed
Fix Version/s: vectorization-branch
   Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Jitendra!

 Enhance explain to indicate vectorized execution of operators.
 --

 Key: HIVE-5187
 URL: https://issues.apache.org/jira/browse/HIVE-5187
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: vectorization-branch

 Attachments: HIVE-5187.1.patch


 Explain should be able to indicate whether an operator will be executed in 
 vectorized mode or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call


 [ 
https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4442:
-

Attachment: HIVE-4442-3.patch

 [HCatalog] WebHCat should not override user.name parameter for Queue call
 -

 Key: HIVE-4442
 URL: https://issues.apache.org/jira/browse/HIVE-4442
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4442-1.patch, HIVE-4442-2.patch, HIVE-4442-3.patch


 Currently templeton for the Queue call uses the user.name to filter the 
 results of the call in addition to the default security.
 Ideally the filter is an optional parameter to the call independent of the 
 security check.
 I would suggest a parameter in addition to GET queue (jobs) give you all the 
 jobs a user have permission:
 GET queue?showall=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call


 [ 
https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4442:
-

Attachment: (was: HIVE-4443-3.patch)

 [HCatalog] WebHCat should not override user.name parameter for Queue call
 -

 Key: HIVE-4442
 URL: https://issues.apache.org/jira/browse/HIVE-4442
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4442-1.patch, HIVE-4442-2.patch, HIVE-4442-3.patch


 Currently templeton for the Queue call uses the user.name to filter the 
 results of the call in addition to the default security.
 Ideally the filter is an optional parameter to the call independent of the 
 security check.
 I would suggest a parameter in addition to GET queue (jobs) give you all the 
 jobs a user have permission:
 GET queue?showall=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


[ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757019#comment-13757019
 ] 

Phabricator commented on HIVE-4617:
---

thejas has commented on the revision HIVE-4617 [jira] ExecuteStatementAsync 
call to run a query in non-blocking mode.

INLINE COMMENTS
  conf/hive-default.xml.template:1851 the value here also needs to be updated 
to 500
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:144 
The client is not getting the error details in case of async exec code path. We 
need to address that.
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java:59 
Do you plan to make the change to new thread pool as part of this jira ? If 
not, you might want to set the default number of async threads to be lower, and 
increase it as part of the new thread pool change.

REVISION DETAIL
  https://reviews.facebook.net/D12507

To: JIRA, vaibhavgumashta
Cc: cwsteinbach, thejas


 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507.2.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5186) Remove JoinReducerProc from ReduceSinkDeDuplication

[
https://issues.apache.org/jira/browse/HIVE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757010#comment-13757010
]

Yin Huai commented on HIVE-5186:

Right, I am removing cases associated with the pattern JOIN%.*%RS%.

Here are my reasons.
1. Seems JoinReducerProc will kick in when hive.auto.convert.join=false and
hive.auto.convert.join.noconditionaltask=false. Because of these conditions, I
thought maybe it is hard to trigger this part of code in practice.
2. For some cases, I am not sure if we can generate executable plan when both
JoinReducerProc and Correlation Optimizer fire.
For example,
{code}
JOIN3
/\
/ \
/\
GBY1 GBY2
| |
| |
JOIN1 JOIN2
{code}
If all of these five operators share the same key, JoinReducerProc will drop
the RS between JOIN1 and GBY1, and drop the RS between JOIN2 and GBY2. Then,
Correlation Optimizer will try to drop the RSs associated with JOIN3. Since
there is no Mux between JOIN1 and GBY1, and between JOIN2 and GBY2, I am not
sure if the plan is executable. But I have not tried this case. Will give it a
try and post what I find.

Remove JoinReducerProc from ReduceSinkDeDuplication
---

Key: HIVE-5186
URL: https://issues.apache.org/jira/browse/HIVE-5186
Project: Hive
Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Minor
Attachments: HIVE-5186.1.patch.txt

Correlation Optimizer will take care patterns involving JoinOperator. We can
remove JoinReducerProc from ReduceSinkDeDuplication.

[jira] [Commented] (HIVE-5104) HCatStorer fails to store boolean type


[ 
https://issues.apache.org/jira/browse/HIVE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756979#comment-13756979
 ] 

Sushanth Sowmyan commented on HIVE-5104:


Actually, on manual application, the whitespace errors are easy enough to fix. 
I'm uploading a whitespace-corrected version of the patch and will try to get 
this patch in before the freeze if tests pass.

 HCatStorer fails to store boolean type
 --

 Key: HIVE-5104
 URL: https://issues.apache.org/jira/browse/HIVE-5104
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Ron Frohock
 Attachments: HIVE-5104.1.patch.txt


 Unable to store boolean values to HCat table 
 Assume in Hive you have two tables...
 CREATE TABLE btest(test as boolean);
 CREATE TABLE btest2(test as boolean);
 Then in Pig 
 A = LOAD 'btest' USING org.apache.hcatalog.pig.HCatLoader();
 STORE A INTO 'btest2' USING org.apache.hcatalog.pig.HCatStorer();
 You will get an ERROR 115: Unsupported type 5: in Pig's Schema  
 Checking HCatBaseStorer.java, the case for data types doesn't check for 
 booleans.  Might have been overlooked in adding boolean to Pig in 0.10

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call

2013-09-03 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757061#comment-13757061
]

Eugene Koifman commented on HIVE-4442:
--

The point is that UgiFactory creates a proxy user with proper credentials,
while UserGroupInformation.createRemoteUser() works in simple security mode...
Generally, in WebHCat a param user is determined by Server#getDoAsUser().
If doAs is specified, the user=doAs, otherwise it's the user making the call.

In the HIVE-4442.3.patch StatusDelegator uses UgiFactory to get
UserGroupInformation but the other 2 use
UserGroupInformation.createRemoteUser().

So from a security point of view I think Delete/List/StatusDelegator should all
use UgiFactory with user as argument.

UserGroupInformation.getLoginUser() will return the user running WebHCat
(hcat by default).

[HCatalog] WebHCat should not override user.name parameter for Queue call
-

Key: HIVE-4442
URL: https://issues.apache.org/jira/browse/HIVE-4442
Project: Hive
Issue Type: Bug
Components: HCatalog
Reporter: Daniel Dai
Attachments: HIVE-4442-1.patch, HIVE-4442-2.patch, HIVE-4442-3.patch

Currently templeton for the Queue call uses the user.name to filter the
results of the call in addition to the default security.
Ideally the filter is an optional parameter to the call independent of the
security check.
I would suggest a parameter in addition to GET queue (jobs) give you all the
jobs a user have permission:
GET queue?showall=true

[jira] [Created] (HIVE-5199) Read Only Custom SerDe works with HDP 1.1 but not with HDP 1.3

Hari Sankar Sivarama Subramaniyan created HIVE-5199:
---

 Summary: Read Only Custom SerDe works with HDP 1.1 but not with 
HDP 1.3
 Key: HIVE-5199
 URL: https://issues.apache.org/jira/browse/HIVE-5199
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical


Custom serdes which used to work in HDP 1.1 is no longer working with HDP 1.3.
The issue happens when the partition serde is not of settable type in HDP 1.3. 
The below exception happens via FetchOperator as well as MapOperator. 

Inside FetchOperator consider the following call:
getRecordReader()-ObjectInspectorConverters. getConverter()
The output object inspector is of settable type(because it is generated via 
ObjectInspectorConverters.getConvertedOI()) where as the input object inspector 
that gets passed  as serde.getObjectorInspector() and is non-settable. Inside  
getConverter(), the (inputOI.equals(outputOI)) check fails and the switch 
statement tries to cast the non-settable object inspector to a settable object 
inspector.  

The stack trace as follows:
2013-08-28 17:57:25,307 ERROR CliDriver (SessionState.java:printError(432)) - 
Failed with exception java.io.IOException:java.lang.ClassCastException: 
com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
java.io.IOException: java.lang.ClassCastException: 
com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ClassCastException: 
com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:307)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 11334: HIVE-4568 Beeline needs to support resolving variables

2013-09-03 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11334/#review25859
---



beeline/src/java/org/apache/hive/beeline/BeeLine.java
https://reviews.apache.org/r/11334/#comment50420

thats fine



beeline/src/java/org/apache/hive/beeline/BeeLine.properties
https://reviews.apache.org/r/11334/#comment50421

It is more accurate to say that this is a hive specific variable, as 
beeline is still a generic tool.





beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java
https://reviews.apache.org/r/11334/#comment50422

Yes, makes sense to call this hivevariables itself.




beeline/src/java/org/apache/hive/beeline/DatabaseConnection.java
https://reviews.apache.org/r/11334/#comment50423

This change has not been made in revised patch.



- Thejas Nair


On Aug. 24, 2013, 8:19 p.m., Xuefu Zhang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11334/
 ---
 
 (Updated Aug. 24, 2013, 8:19 p.m.)
 
 
 Review request for hive and Ashutosh Chauhan.
 
 
 Bugs: HIVE-4568
 https://issues.apache.org/jira/browse/HIVE-4568
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 1. Added command variable substition
 2. Added test case
 
 
 Diffs
 -
 
   beeline/src/java/org/apache/hive/beeline/BeeLine.java 4c6eb9b 
   beeline/src/java/org/apache/hive/beeline/BeeLine.properties b6650cf 
   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 61bdeee 
   beeline/src/java/org/apache/hive/beeline/DatabaseConnection.java c70003d 
   beeline/src/test/org/apache/hive/beeline/src/test/TestBeeLineWithArgs.java 
 030f6b0 
 
 Diff: https://reviews.apache.org/r/11334/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Xuefu Zhang

[jira] [Commented] (HIVE-5107) Change hive's build to maven


[ 
https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757047#comment-13757047
 ] 

Sergey Shelukhin commented on HIVE-5107:


I was thinking about splitting metastore client from server, not just thrift 
(thrift should be in client too), so that users of metastore wouldn't have to 
depend on server. In particular, right now metastore server cannot use anything 
in QL without indirect code, because QL uses metastore client/common bits.

 Change hive's build to maven
 

 Key: HIVE-5107
 URL: https://issues.apache.org/jira/browse/HIVE-5107
 Project: Hive
  Issue Type: Task
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 I can not cope with hive's build infrastructure any more. I have started 
 working on porting the project to maven. When I have some solid progess i 
 will github the entire thing for review. Then we can talk about switching the 
 project somehow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5096) Add q file tests for ORC predicate pushdown


 [ 
https://issues.apache.org/jira/browse/HIVE-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5096:
---

Status: Open  (was: Patch Available)

There is already over10k file in data/files/ Can you use that one in your 
tests, instead of adding new one?

 Add q file tests for ORC predicate pushdown
 ---

 Key: HIVE-5096
 URL: https://issues.apache.org/jira/browse/HIVE-5096
 Project: Hive
  Issue Type: Test
  Components: CLI, File Formats, StorageHandler
Affects Versions: 0.12.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-5096.patch


 Add q file tests that checks the validity of the results when predicate 
 pushdown is turned on and off. Also test for filter expressions in table scan 
 operator when predicate pushdown is turned on for ORC. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5107) Change hive's build to maven


[ 
https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757049#comment-13757049
 ] 

Sergey Shelukhin commented on HIVE-5107:


I figure if there will be disturbance in the build anyway, I can do it right 
after to not have disturbance for too long :)

 Change hive's build to maven
 

 Key: HIVE-5107
 URL: https://issues.apache.org/jira/browse/HIVE-5107
 Project: Hive
  Issue Type: Task
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 I can not cope with hive's build infrastructure any more. I have started 
 working on porting the project to maven. When I have some solid progess i 
 will github the entire thing for review. Then we can talk about switching the 
 project somehow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5049) Create an ORC test case that has a 0.11 ORC file


[ 
https://issues.apache.org/jira/browse/HIVE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757080#comment-13757080
 ] 

Ashutosh Chauhan commented on HIVE-5049:


+1

 Create an ORC test case that has a 0.11 ORC file
 

 Key: HIVE-5049
 URL: https://issues.apache.org/jira/browse/HIVE-5049
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Prasanth J
 Attachments: HIVE-5049.patch.txt, orc-file-11-format.orc


 We should add a test case that includes a 0.11.0 ORC file to ensure 
 compatibility for reading old ORC files is kept correct.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5131) JDBC client's hive variables are not passed to HS2


[ 
https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757043#comment-13757043
 ] 

Thejas M Nair commented on HIVE-5131:
-

[~xuefuz] This was never documented in the doc page. So this is actually adding 
the new feature. Can you add the documentation for this part url format in the 
release note of the jira ? Once this is committed it can be moved to the wiki 
page as an upcoming 0.12 feature.
The wiki page - 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients



 JDBC client's hive variables are not passed to HS2
 --

 Key: HIVE-5131
 URL: https://issues.apache.org/jira/browse/HIVE-5131
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-5131.1.patch, HIVE-5131.patch, HIVE-5131.patch


 Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC 
 clients suffer the same problem. This was identified in HIVE-4568. I decided 
 it might be better to separate issue from a different issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5014) [HCatalog] Fix HCatalog build issue on Windows


[ 
https://issues.apache.org/jira/browse/HIVE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757091#comment-13757091
 ] 

Sushanth Sowmyan commented on HIVE-5014:


Filename was not set appropriately for jenkins auto-build to pick it up. 
However, I have manually verified this. +1, committing.


 [HCatalog] Fix HCatalog build issue on Windows
 --

 Key: HIVE-5014
 URL: https://issues.apache.org/jira/browse/HIVE-5014
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-5014-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5157) ReduceSinkDeDuplication ignores hive.groupby.skewindata=true


 [ 
https://issues.apache.org/jira/browse/HIVE-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5157:
---

Description: If hive.groupby.skewindata=true, we should generate two MR 
jobs. But, ReduceSinkDeDuplication will merge these two into a single MR job. 
Example: groupby2_map_skew.q and groupby2.q  (was: If 
hive.groupby.skewindata=true, we should generate two MR jobs. But, 
ReduceSinkDeDuplication will merge these two into a single MR job. Example: 
groupby2_map_skew.q)

 ReduceSinkDeDuplication ignores hive.groupby.skewindata=true 
 -

 Key: HIVE-5157
 URL: https://issues.apache.org/jira/browse/HIVE-5157
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai

 If hive.groupby.skewindata=true, we should generate two MR jobs. But, 
 ReduceSinkDeDuplication will merge these two into a single MR job. Example: 
 groupby2_map_skew.q and groupby2.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5014) [HCatalog] Fix HCatalog build issue on Windows


 [ 
https://issues.apache.org/jira/browse/HIVE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5014:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Thanks, Daniel.

 [HCatalog] Fix HCatalog build issue on Windows
 --

 Key: HIVE-5014
 URL: https://issues.apache.org/jira/browse/HIVE-5014
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-5014-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5104) HCatStorer fails to store boolean type


[ 
https://issues.apache.org/jira/browse/HIVE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756967#comment-13756967
 ] 

Sushanth Sowmyan commented on HIVE-5104:


Hi,

From looking through the patch, the patch looks good from a functionality 
perspective. Thank you for the patch.

However, it does not seem to apply cleanly on trunk, and also has whitespace 
issues (trailing whitespaces) and needs regeneration.

As part of HIVE-4869, all HCatalog jiras will be frozen for a couple of days as 
the package renaming effort happening there affects all jiras. Could you please 
regenerate your patch after that and re-apply?

 HCatStorer fails to store boolean type
 --

 Key: HIVE-5104
 URL: https://issues.apache.org/jira/browse/HIVE-5104
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Ron Frohock
 Attachments: HIVE-5104.1.patch.txt


 Unable to store boolean values to HCat table 
 Assume in Hive you have two tables...
 CREATE TABLE btest(test as boolean);
 CREATE TABLE btest2(test as boolean);
 Then in Pig 
 A = LOAD 'btest' USING org.apache.hcatalog.pig.HCatLoader();
 STORE A INTO 'btest2' USING org.apache.hcatalog.pig.HCatStorer();
 You will get an ERROR 115: Unsupported type 5: in Pig's Schema  
 Checking HCatBaseStorer.java, the case for data types doesn't check for 
 booleans.  Might have been overlooked in adding boolean to Pig in 0.10

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5131) JDBC client's hive variables are not passed to HS2


 [ 
https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5131:


Status: Patch Available  (was: Open)

+1 . Making it patch available to kick off the tests.


 JDBC client's hive variables are not passed to HS2
 --

 Key: HIVE-5131
 URL: https://issues.apache.org/jira/browse/HIVE-5131
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-5131.1.patch, HIVE-5131.patch, HIVE-5131.patch


 Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC 
 clients suffer the same problem. This was identified in HIVE-4568. I decided 
 it might be better to separate issue from a different issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1511) Hive plan serialization is slow


 [ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-1511:
---

Attachment: HIVE-1511.11.patch

v11 fixes the hbase tests.

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.11.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.10.patch, 
 HIVE-1511.11.patch, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, 
 HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.9.patch, HIVE-1511.patch, 
 HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, 
 HIVE-1511.wip.9.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5104) HCatStorer fails to store boolean type


[ 
https://issues.apache.org/jira/browse/HIVE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757028#comment-13757028
 ] 

Sushanth Sowmyan commented on HIVE-5104:


One more required change was that null as presented in the modification to 
the test does not succeed, it needs to be NULL.

 HCatStorer fails to store boolean type
 --

 Key: HIVE-5104
 URL: https://issues.apache.org/jira/browse/HIVE-5104
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Ron Frohock
 Attachments: HIVE-5104.1.patch.txt, HIVE-5104.2.patch


 Unable to store boolean values to HCat table 
 Assume in Hive you have two tables...
 CREATE TABLE btest(test as boolean);
 CREATE TABLE btest2(test as boolean);
 Then in Pig 
 A = LOAD 'btest' USING org.apache.hcatalog.pig.HCatLoader();
 STORE A INTO 'btest2' USING org.apache.hcatalog.pig.HCatStorer();
 You will get an ERROR 115: Unsupported type 5: in Pig's Schema  
 Checking HCatBaseStorer.java, the case for data types doesn't check for 
 booleans.  Might have been overlooked in adding boolean to Pig in 0.10

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5104) HCatStorer fails to store boolean type


 [ 
https://issues.apache.org/jira/browse/HIVE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5104:
---

Attachment: HIVE-5104.2.patch

 HCatStorer fails to store boolean type
 --

 Key: HIVE-5104
 URL: https://issues.apache.org/jira/browse/HIVE-5104
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Ron Frohock
 Attachments: HIVE-5104.1.patch.txt, HIVE-5104.2.patch


 Unable to store boolean values to HCat table 
 Assume in Hive you have two tables...
 CREATE TABLE btest(test as boolean);
 CREATE TABLE btest2(test as boolean);
 Then in Pig 
 A = LOAD 'btest' USING org.apache.hcatalog.pig.HCatLoader();
 STORE A INTO 'btest2' USING org.apache.hcatalog.pig.HCatStorer();
 You will get an ERROR 115: Unsupported type 5: in Pig's Schema  
 Checking HCatBaseStorer.java, the case for data types doesn't check for 
 booleans.  Might have been overlooked in adding boolean to Pig in 0.10

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23


[ 
https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757092#comment-13757092
 ] 

Ashutosh Chauhan commented on HIVE-4750:


+1

 Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
 ---

 Key: HIVE-4750
 URL: https://issues.apache.org/jira/browse/HIVE-4750
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.12.0
Reporter: Brock Noland
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-4750.2.patch, HIVE-4750.patch


 Removing 6,7,8 from the scope of HIVE-4746.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5096) Add q file tests for ORC predicate pushdown

2013-09-03 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757094#comment-13757094
 ] 

Prasanth J commented on HIVE-5096:
--

I just manually added few empty (NULL) values to specific columns for checking 
against NULL/NOT NULL predicates. Since the minimum index stride is 1000, I 
kept the total number of rows in the file to 1050 so that there are 2 index 
strides in the ORC file. If required I can remove few columns that are not 
being used by the tests and can only maintain the relevant columns. 

 Add q file tests for ORC predicate pushdown
 ---

 Key: HIVE-5096
 URL: https://issues.apache.org/jira/browse/HIVE-5096
 Project: Hive
  Issue Type: Test
  Components: CLI, File Formats, StorageHandler
Affects Versions: 0.12.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-5096.patch


 Add q file tests that checks the validity of the results when predicate 
 pushdown is turned on and off. Also test for filter expressions in table scan 
 operator when predicate pushdown is turned on for ORC. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23


 [ 
https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4750:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
 ---

 Key: HIVE-4750
 URL: https://issues.apache.org/jira/browse/HIVE-4750
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.12.0
Reporter: Brock Noland
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-4750.2.patch, HIVE-4750.patch


 Removing 6,7,8 from the scope of HIVE-4746.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23