[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)
[ https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756385#comment-13756385 ] Thejas M Nair commented on HIVE-5018: - I haven't seen that checkstyle error you are seeing. I am not sure why it is happening. You can try canceling the patch , uploading a new file and making it patch available again to see if that happens on the pre-commit test environment as well I went through some of the changes, I see that you are trying to save on object reference creation, such as the following - {code} --- a/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java +++ b/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java @@ -113,8 +113,9 @@ public boolean cleanUp(String rowID) { scan.setFilter(filter); ResultScanner scanner = htable.getScanner(scan); ArrayListDelete toDelete = new ArrayListDelete(); + Delete delete; for (Result result : scanner) { -Delete delete = new Delete(result.getRow()); +delete = new Delete(result.getRow()); toDelete.add(delete); } htable.delete(toDelete); {code} While object creation has significant costs associated with it, I don't think this reference re-use will have any real impact. A reference is like a pointer in C/C++ , a memory location that stores the address of the object. JVM would be able to re-use this memory location in the existing implementation. Can you give more details of the arithmetic program you ran to check the performance difference (including the code, total runtime, how many times you ran it)? Avoiding object instantiation in loops (issue 6) Key: HIVE-5018 URL: https://issues.apache.org/jira/browse/HIVE-5018 Project: Hive Issue Type: Sub-task Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: HIVE-5018.1.patch.txt Object instantiation inside loops is very expensive. Where possible, object references should be created outside the loop so that they can be reused. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756390#comment-13756390 ] Vaibhav Gumashta commented on HIVE-4617: Just for comparison, metastore currently has 200 min and 100,000 max threads. Proposing here, that HS2 has 500 thrift workers and 500 async threads. Also, was thinking of changing the async thread pool to a cached thread pool, which will not keep all 500 threads alive all the time, but will create new threads when required, keeping the unused ones alive for a certain time before purging them. Will be interesting to hear more thoughts. Thanks! ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507.2.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756398#comment-13756398 ] Hudson commented on HIVE-5163: -- FAILURE: Integrated in Hive-trunk-hadoop2 #398 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/398/]) HIVE-5163 : refactor org.apache.hadoop.mapred.HCatMapRedUtil - HIVE-5163.update.2 (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519538) * /hive/trunk/hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatMapRedUtil.java refactor org.apache.hadoop.mapred.HCatMapRedUtil Key: HIVE-5163 URL: https://issues.apache.org/jira/browse/HIVE-5163 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update, HIVE-5163.update.2 Everything that this class does is delegated to a Shim class. To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of HCatMapRedUtil and make the calls directly to the Shim layer. It will make it easier because all org.apache.hcatalog classes will move to org.apache.hive.hcatalog classes thus making way to provide binary backwards compat. This class won't change it's name so it's more difficult to provide backwards compat. The org.apache.hadoop.mapred.TempletonJobTracker is not an issue since it goes away in HIVE-4460. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756399#comment-13756399 ] Hudson commented on HIVE-5163: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #82 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/82/]) HIVE-5163 : refactor org.apache.hadoop.mapred.HCatMapRedUtil - HIVE-5163.update.2 (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519538) * /hive/trunk/hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatMapRedUtil.java refactor org.apache.hadoop.mapred.HCatMapRedUtil Key: HIVE-5163 URL: https://issues.apache.org/jira/browse/HIVE-5163 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update, HIVE-5163.update.2 Everything that this class does is delegated to a Shim class. To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of HCatMapRedUtil and make the calls directly to the Shim layer. It will make it easier because all org.apache.hcatalog classes will move to org.apache.hive.hcatalog classes thus making way to provide binary backwards compat. This class won't change it's name so it's more difficult to provide backwards compat. The org.apache.hadoop.mapred.TempletonJobTracker is not an issue since it goes away in HIVE-4460. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756400#comment-13756400 ] Hudson commented on HIVE-5137: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #82 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/82/]) HIVE-5137: A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask (Vaibhav Gumashta via Thejas Nair) (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519547) * /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java * /hive/trunk/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756418#comment-13756418 ] Phabricator commented on HIVE-4617: --- cwsteinbach has commented on the revision HIVE-4617 [jira] ExecuteStatementAsync call to run a query in non-blocking mode. INLINE COMMENTS conf/hive-default.xml.template:1857 This value doesn't match the one listed in HiveConf (500). conf/hive-default.xml.template:1864 Do you think people will actually want to set this value to a fraction of second? If not I recommend changing the units to seconds. service/if/TCLIService.thrift:42 Please add a blank line between lines 41 and 42. service/src/java/org/apache/hive/service/cli/OperationState.java:68 I think WAITING-FINISHED and WAITING-CLOSED probably aren't valid transitions, but I'm not sure. What do you think? service/src/java/org/apache/hive/service/cli/session/SessionManager.java:59 s/VG/TODO/ service/src/java/org/apache/hive/service/cli/session/SessionManager.java:81 10,000ms should not be hardcoded. Please reference 'timeout' instead. service/src/java/org/apache/hive/service/cli/OperationState.java:75 If RUNNING-WAITING is not a valid transition (which makes sense to me), then maybe we should use the adjective PENDING instead of WAITING. There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton REVISION DETAIL https://reviews.facebook.net/D12507 To: JIRA, vaibhavgumashta Cc: cwsteinbach ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507.2.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5196) ThriftCLIService.java uses stderr to print the stack trace, it should use the logger instead.
[ https://issues.apache.org/jira/browse/HIVE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756425#comment-13756425 ] Carl Steinbach commented on HIVE-5196: -- [~vgumashta] Thanks for catching this. I think we may actually want to remove these statements altogether, the rationale being that we probably don't want to write a message to the server's log every time a client tries to execute an illegal statement or calls an RPC with invalid input parameter values. Ideally this error information should be returned directly to the client instead. ThriftCLIService.java uses stderr to print the stack trace, it should use the logger instead. - Key: HIVE-5196 URL: https://issues.apache.org/jira/browse/HIVE-5196 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta ThriftCLIService.java uses stderr to print the stack trace, it should use the logger instead. Using e.printStackTrace is not suitable for production. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756429#comment-13756429 ] Phabricator commented on HIVE-4617: --- vaibhavgumashta has commented on the revision HIVE-4617 [jira] ExecuteStatementAsync call to run a query in non-blocking mode. INLINE COMMENTS service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:152 This should go. RUNNING is set in runInternal. REVISION DETAIL https://reviews.facebook.net/D12507 To: JIRA, vaibhavgumashta Cc: cwsteinbach ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507.2.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756465#comment-13756465 ] Hudson commented on HIVE-5137: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #149 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/149/]) HIVE-5137: A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask (Vaibhav Gumashta via Thejas Nair) (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519547) * /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java * /hive/trunk/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756464#comment-13756464 ] Hudson commented on HIVE-5163: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #149 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/149/]) HIVE-5163 : refactor org.apache.hadoop.mapred.HCatMapRedUtil - HIVE-5163.update.2 (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519538) * /hive/trunk/hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/HCatMapRedUtil.java refactor org.apache.hadoop.mapred.HCatMapRedUtil Key: HIVE-5163 URL: https://issues.apache.org/jira/browse/HIVE-5163 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update, HIVE-5163.update.2 Everything that this class does is delegated to a Shim class. To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of HCatMapRedUtil and make the calls directly to the Shim layer. It will make it easier because all org.apache.hcatalog classes will move to org.apache.hive.hcatalog classes thus making way to provide binary backwards compat. This class won't change it's name so it's more difficult to provide backwards compat. The org.apache.hadoop.mapred.TempletonJobTracker is not an issue since it goes away in HIVE-4460. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756583#comment-13756583 ] Hudson commented on HIVE-5137: -- FAILURE: Integrated in Hive-trunk-hadoop2 #399 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/399/]) HIVE-5137: A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask (Vaibhav Gumashta via Thejas Nair) (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519547) * /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java * /hive/trunk/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5149: --- Status: Patch Available (was: Open) trigger the pre-commit build ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5149: --- Status: Open (was: Patch Available) ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756646#comment-13756646 ] Brock Noland commented on HIVE-1511: Looks like there was a compilation issue in Hcatalog. I see a ptest trunk build ran successfully after this so I kicked it off again. Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0, 0.11.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.10.patch, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: RFC: Major HCatalog refactoring
OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman ekoif...@hortonworks.com wrote: These will be new (I.e. 0.11 version) test classes which will be in the old org.apache.hcatalog package. How does that affect the new framework? On Saturday, August 31, 2013, Brock Noland wrote: Will these be new Java class files or new test methods to existing classes? I am just curious as to how this will play into the distributed testing framework. On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman ekoif...@hortonworks.com wrote: not quite double but close (on my Mac that means it will go up from 35 minutes to 55-60) so in greater scheme of things it should be negligible On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.com wrote: By coverage do you mean to say that: Thus, the published HCatalog JARs will contain both packages and the unit tests will cover both versions of the API. We are going to double the time of unit tests for this module? On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman ekoif...@hortonworks.com wrote: This will change every file under hcatalog so it has to happen before the branching. Most likely at the beginning of next week. Thanks On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Hi, Here is the plan for refactoring HCatalog as was agreed to when it was merged into Hive during. HIVE-4869 is the umbrella bug for this work. The changes are complex and touch every single file under hcatalog. Please comment. When HCatalog project was merged into Hive on 0.11 several integration items did not make the 0.11 deadline. It was agreed to finish them in 0.12 release. Specifically: 1. HIVE-4895 - change package name from org.apache.hcatalog to org.apache.hive.hcatalog 2. HIVE-4896 - create binary backwards compatibility layer for hcat users upgrading from 0.11 to 0.12 For item 1, we’ll just move every file under org.apache.hcatalog to org.apache.hive.hcatalog and update all “package” and “import” statement as well as all hcat/webhcat scripts. This will include all JUnit tests. Item 2 will ensure that if a user has a M/R program or Pig script, etc. that uses HCatalog public API, their programs will continue to work w/o change with hive 0.12. The proposal is to make the changes that have as little impact on the build system, in part to make upcoming ‘mavenization’ of hive easier, in part to make the changes more manageable. The list of public interfaces (and their transitive closure) for which backwards compat will be provided. 1. HCatLoader 2. HCatStorer 3. HCatInputFormat 4. HCatOutputFormat 5. HCatReader 6. HCatWriter 7. HCatRecord 8. HCatSchema To achieve this, 0.11 version of these classes will be added in org.apache.hcatalog package (after item 1 is done). Each of these classes -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5149: --- Attachment: HIVE-5149.3.patch ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5149: --- Attachment: (was: HIVE-5149.3.patch) ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756680#comment-13756680 ] Edward Capriolo commented on HIVE-5137: --- If you call fetchAll() does it return empty List or throw exception? There may be some users calling fetchAll() regardless of the query. A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756671#comment-13756671 ] Hudson commented on HIVE-5137: -- FAILURE: Integrated in Hive-trunk-h0.21 #2307 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2307/]) HIVE-5137: A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask (Vaibhav Gumashta via Thejas Nair) (thejas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519547) * /hive/trunk/jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java * /hive/trunk/service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4
[ https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5112: --- Attachment: HIVE-5112.2.patch v2 of the patch uses Hadoop 2.1.0-beta. Upgrade protobuf to 2.5 from 2.4 Key: HIVE-5112 URL: https://issues.apache.org/jira/browse/HIVE-5112 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Owen O'Malley Attachments: HIVE-5112.2.patch, HIVE-5112.D12429.1.patch Hadoop and Hbase have both upgraded protobuf. We should as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756723#comment-13756723 ] Thejas M Nair commented on HIVE-5137: - [~appodictic] I didn't understand your comment. fetchAll() method of which class ? A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756730#comment-13756730 ] Edward Capriolo commented on HIVE-5137: --- HiveInterface.fetchAll(). I know we have scripts that call FetchAll() or fetchOne() on queries that probably do not have one. I wanted to make sure this will not be a breaking change to existing code. A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5196) ThriftCLIService.java uses stderr to print the stack trace, it should use the logger instead.
[ https://issues.apache.org/jira/browse/HIVE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756733#comment-13756733 ] Thejas M Nair commented on HIVE-5196: - [~cwsteinbach] The jdbc driver itself does not do any logging, and relying on the application to follow good practices such as logging does not always work. Also, when something goes wrong, it is not always because of an user error. For debugging when something goes wrong, I think it will be very valuable to log the errors on server side as well. The error here can probably be logged in the server as INFO (or WARN) instead of ERROR. cc [~prasadm] ThriftCLIService.java uses stderr to print the stack trace, it should use the logger instead. - Key: HIVE-5196 URL: https://issues.apache.org/jira/browse/HIVE-5196 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta ThriftCLIService.java uses stderr to print the stack trace, it should use the logger instead. Using e.printStackTrace is not suitable for production. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)
[ https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756731#comment-13756731 ] Benjamin Jakobus commented on HIVE-5018: Here is some sample code: {quote} /** * Examine the performance difference between declaring variables inside loops * and declaring them outside of loops. */ public class InLoopInstantiationTest { public InLoopInstantiationTest() { long start = System.currentTimeMillis(); SessionIdentifierGenerator gen = new SessionIdentifierGenerator(); for (int i = 0; i 1; i++) { FooBar f = new FooBar(); Integer i1 = new Integer(i); String s = gen.nextSessionId(); } long end = System.currentTimeMillis(); System.out.println(in loop instantiation took + (end - start) + milliseconds); start = System.currentTimeMillis(); FooBar f; Integer i1; String s; for (int i = 0; i 1; i++) { f = new FooBar(); i1 = new Integer(i); s = gen.nextSessionId(); } end = System.currentTimeMillis(); System.out.println(avoiding in loop instantiation took + (end - start) + milliseconds); } public static void main(String[] args) { new InLoopInstantiationTest(); } private class FooBar { private String foo = asdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasd + asdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasd + asdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasdasd; } public final class SessionIdentifierGenerator { private SecureRandom random = new SecureRandom(); public String nextSessionId() { return new BigInteger(130, random).toString(32); } } } {quote} The arithmetic script that I used to test this code in Hive is: {quote} SELECT (dataset.age * dataset.gpa + 3) AS F1, (dataset.age/dataset.gpa - 1.5) AS F2 FROMdataset WHERE dataset.gpa 0; {quote} Avoiding object instantiation in loops (issue 6) Key: HIVE-5018 URL: https://issues.apache.org/jira/browse/HIVE-5018 Project: Hive Issue Type: Sub-task Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: HIVE-5018.1.patch.txt Object instantiation inside loops is very expensive. Where possible, object references should be created outside the loop so that they can be reused. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756739#comment-13756739 ] Thejas M Nair commented on HIVE-5137: - HiveInterface is specific to hiveserver1, so this HS2 change will have no impact. A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)
[ https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756749#comment-13756749 ] Thejas M Nair commented on HIVE-5018: - What was the runtime ? To reduce the impact of noise caused by OS process and other things running on the system, I would recommend making sure that each run for at least 100 seconds (by increasing the number of iterations in the loop), and repeating it a few times (3-4?). Can you please publish the numbers with that ? If there is no noticeable performance difference, I would rather stick to instantiating the variables inside the loop. That limits the scope of these variables and makes for more readable code (and prevents accidental use). Avoiding object instantiation in loops (issue 6) Key: HIVE-5018 URL: https://issues.apache.org/jira/browse/HIVE-5018 Project: Hive Issue Type: Sub-task Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: HIVE-5018.1.patch.txt Object instantiation inside loops is very expensive. Where possible, object references should be created outside the loop so that they can be reused. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: RFC: Major HCatalog refactoring
Current (sequential) run of all hive/hcat unit tests takes 10-15 hours. Is another 20-30 minutes that significant? I'm generally wary of unit tests that are not run continuously and automatically. It delays the detection of problems and then what was probably an obvious fix at the time the change was made becomes a long debugging session (often by someone other than whose change broke things). I think this is especially true given how many people are contributing to hive. On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote: OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman ekoif...@hortonworks.com wrote: These will be new (I.e. 0.11 version) test classes which will be in the old org.apache.hcatalog package. How does that affect the new framework? On Saturday, August 31, 2013, Brock Noland wrote: Will these be new Java class files or new test methods to existing classes? I am just curious as to how this will play into the distributed testing framework. On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman ekoif...@hortonworks.com wrote: not quite double but close (on my Mac that means it will go up from 35 minutes to 55-60) so in greater scheme of things it should be negligible On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.com wrote: By coverage do you mean to say that: Thus, the published HCatalog JARs will contain both packages and the unit tests will cover both versions of the API. We are going to double the time of unit tests for this module? On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman ekoif...@hortonworks.com wrote: This will change every file under hcatalog so it has to happen before the branching. Most likely at the beginning of next week. Thanks On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Hi, Here is the plan for refactoring HCatalog as was agreed to when it was merged into Hive during. HIVE-4869 is the umbrella bug for this work. The changes are complex and touch every single file under hcatalog. Please comment. When HCatalog project was merged into Hive on 0.11 several integration items did not make the 0.11 deadline. It was agreed to finish them in 0.12 release. Specifically: 1. HIVE-4895 - change package name from org.apache.hcatalog to org.apache.hive.hcatalog 2. HIVE-4896 - create binary backwards compatibility layer for hcat users upgrading from 0.11 to 0.12 For item 1, we’ll just move every file under org.apache.hcatalog to org.apache.hive.hcatalog and update all “package” and “import” statement as well as all hcat/webhcat scripts. This will include all JUnit tests. Item 2 will ensure that if a user has a M/R program or Pig script, etc. that uses HCatalog public API, their programs will continue to work w/o change with hive 0.12. The proposal is to make the changes that have as little impact on the build system, in part to make upcoming ‘mavenization’ of hive easier, in part to make the changes more manageable. The list of public interfaces (and their transitive closure) for which backwards compat will be provided. 1. HCatLoader 2. HCatStorer 3. HCatInputFormat 4. HCatOutputFormat 5. HCatReader 6. HCatWriter 7. HCatRecord 8. HCatSchema To achieve this, 0.11 version of these classes will be added in org.apache.hcatalog package (after item 1 is done). Each of these classes -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the
[jira] [Commented] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756764#comment-13756764 ] Hive QA commented on HIVE-1511: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12601094/HIVE-1511.10.patch {color:red}ERROR:{color} -1 due to 50 failed/errored test(s), 2905 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.parse.TestParse.testParse_input2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_cast1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.ql.parse.TestParse.testParse_input3 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input7 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join8 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_part1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join7 org.apache.hadoop.hive.ql.parse.TestParse.testParse_subq org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby6 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testsequencefile org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_when org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample7 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input6 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample3 org.apache.hadoop.hive.ql.parse.TestParse.testParse_case_sensitivity org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample6 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input9 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udfnull org.apache.hadoop.hive.ql.parse.TestParse.testParse_union org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input16 org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby3 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join6 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join3 org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf_case {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/591/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/591/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 50 tests failed {noformat} This message is automatically generated. Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0, 0.11.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.10.patch, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Uploaded a patch for HiVE-5172. Can someone please review? Do I have to be a contributor before I submit a patch? Did run about test with the patch in our test environment (Ran about 1000 pig jobs to read and insert into hive table (via hcatalog), along with equal number of alter table statements) for the past 4 days and haven't seen any error on the client or the server. On Thu, Aug 29, 2013 at 2:39 PM, agateaaa agate...@gmail.com wrote: Thanks Ashutosh. Filed https://issues.apache.org/jira/browse/HIVE-5172 On Thu, Aug 29, 2013 at 11:53 AM, Ashutosh Chauhan hashut...@apache.orgwrote: Thanks Agatea for digging in. Seems like you have hit a bug. Would you mind opening a jira and adding your findings to it. Thanks, Ashutosh On Thu, Aug 29, 2013 at 11:22 AM, agateaaa agate...@gmail.com wrote: Sorry hit send too soon ... Hi All: Put some debugging code in TUGIContainingTransport.getTransport() and I tracked it down to @Override public TUGIContainingTransport getTransport(TTransport trans) { // UGI information is not available at connection setup time, it will be set later // via set_ugi() rpc. transMap.putIfAbsent(trans, new TUGIContainingTransport(trans)); //return transMap.get(trans); //-change TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans == null ) { LOGGER.error ( cannot find transport that was in map !!) } else { LOGGER.debug ( cannot find transport that was in map !!) return retTrans; } } When we run this in our test environment, see that we run into the problem just after GC runs, and cannot find transport that was in the map!! message gets logged. Could the GC be collecting entries from transMap, just before the we get it Tried a minor change which seems to work public TUGIContainingTransport getTransport(TTransport trans) { TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans == null ) { // UGI information is not available at connection setup time, it will be set later // via set_ugi() rpc. transMap.putIfAbsent(trans, retTrans); } return retTrans; } My questions for hive and thrift experts 1.) Do we need to use a ConcurrentMap ConcurrentMapTTransport, TUGIContainingTransport transMap = new MapMaker().weakKeys().weakValues().makeMap(); It does use == to compare keys (which might be the problem), also in this case we cant rely on the trans to be always there in the transMap, even after a put, so in that case change above probably makes sense 2.) Is it better idea to use WeakHashMap with WeakReference instead ? (was looking at org.apache.thrift.transport.TSaslServerTransport, esp change made by THRIFT-1468) e.g. private static MapTTransport, WeakReferenceTUGIContainingTransport transMap3 = Collections.synchronizedMap(new WeakHashMapTTransport, WeakReferenceTUGIContainingTransport()); getTransport() would be something like public TUGIContainingTransport getTransport(TTransport trans) { WeakReferenceTUGIContainingTransport ret = transMap.get(trans); if (ret == null || ret.get() == null) { ret = new WeakReferenceTUGIContainingTransport(new TUGIContainingTransport(trans)); transMap3.put(trans, ret); // No need for putIfAbsent(). // Concurrent calls to getTransport() will pass in different TTransports. } return ret.get(); } I did try 1.) above in our test environment and it does seem to resolve the problem, though i am not sure if I am introducing any other problem Can someone help ? Thanks Agatea On Thu, Aug 29, 2013 at 10:57 AM, agateaaa agate...@gmail.com wrote: Hi All: Put some debugging code in TUGIContainingTransport.getTransport() and I tracked it down to @Override public TUGIContainingTransport getTransport(TTransport trans) { // UGI information is not available at connection setup time, it will be set later // via set_ugi() rpc. transMap.putIfAbsent(trans, new TUGIContainingTransport(trans)); //return transMap.get(trans); -change TUGIContainingTransport retTrans = transMap.get(trans); if ( retTrans == null ) { } On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin There arent too many connections in close_wait state only 1 or two when we run into this. Most likely its because of dropped connection. I could not find any read or write timeouts we can set for the thrift server which will tell thrift to hold on to the client connection. See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem to have been implemented yet. We do have set a client connection timeout but cannot find an equivalent setting for the server. We have a suspicion that this happens when we run two client processes which modify two distinct partitions of the same hive table. We put in a workaround so that the two
Re: RFC: Major HCatalog refactoring
One thing to note is that the 0.11 interfaces are going to be deprecated and will be taken away in a later release. When the interface is taken away, the additional unit tests will also go away. On Tue, Sep 3, 2013 at 9:57 AM, Eugene Koifman ekoif...@hortonworks.com wrote: Current (sequential) run of all hive/hcat unit tests takes 10-15 hours. Is another 20-30 minutes that significant? I'm generally wary of unit tests that are not run continuously and automatically. It delays the detection of problems and then what was probably an obvious fix at the time the change was made becomes a long debugging session (often by someone other than whose change broke things). I think this is especially true given how many people are contributing to hive. On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote: OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman ekoif...@hortonworks.com wrote: These will be new (I.e. 0.11 version) test classes which will be in the old org.apache.hcatalog package. How does that affect the new framework? On Saturday, August 31, 2013, Brock Noland wrote: Will these be new Java class files or new test methods to existing classes? I am just curious as to how this will play into the distributed testing framework. On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman ekoif...@hortonworks.com wrote: not quite double but close (on my Mac that means it will go up from 35 minutes to 55-60) so in greater scheme of things it should be negligible On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.com wrote: By coverage do you mean to say that: Thus, the published HCatalog JARs will contain both packages and the unit tests will cover both versions of the API. We are going to double the time of unit tests for this module? On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman ekoif...@hortonworks.com wrote: This will change every file under hcatalog so it has to happen before the branching. Most likely at the beginning of next week. Thanks On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Hi, Here is the plan for refactoring HCatalog as was agreed to when it was merged into Hive during. HIVE-4869 is the umbrella bug for this work. The changes are complex and touch every single file under hcatalog. Please comment. When HCatalog project was merged into Hive on 0.11 several integration items did not make the 0.11 deadline. It was agreed to finish them in 0.12 release. Specifically: 1. HIVE-4895 - change package name from org.apache.hcatalog to org.apache.hive.hcatalog 2. HIVE-4896 - create binary backwards compatibility layer for hcat users upgrading from 0.11 to 0.12 For item 1, we’ll just move every file under org.apache.hcatalog to org.apache.hive.hcatalog and update all “package” and “import” statement as well as all hcat/webhcat scripts. This will include all JUnit tests. Item 2 will ensure that if a user has a M/R program or Pig script, etc. that uses HCatalog public API, their programs will continue to work w/o change with hive 0.12. The proposal is to make the changes that have as little impact on the build system, in part to make upcoming ‘mavenization’ of hive easier, in part to make the changes more manageable. The list of public interfaces (and their transitive closure) for which backwards compat will be provided. 1. HCatLoader 2. HCatStorer 3. HCatInputFormat 4. HCatOutputFormat 5. HCatReader 6. HCatWriter 7. HCatRecord 8. HCatSchema To achieve this, 0.11 version of these classes will be added in org.apache.hcatalog package (after item 1 is done). Each of these classes -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly
Re: RFC: Major HCatalog refactoring
I would say a main goal of unit and integration testing is to try all code paths. If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. If the current hcatalog tests run in 15 minutes, you make a change and then the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes is a TV show :) As for the overall hive build taking 10-15 hours. I know that :) I used to run them, by hand, on my laptop, because no one would share their build farm with me. I have heard that Hive consumes the vast majority of the resources of apache's build farm! I think we need to be good citizens at apache and attempt to make this better, not worse. Now that we have pre-commit builds we can work at a reasonable pace. Now that we have this nice pre-commit farm, I do not want to create a precedent that now we can go nuts, and start down the same slippery slope. On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman ekoif...@hortonworks.comwrote: Current (sequential) run of all hive/hcat unit tests takes 10-15 hours. Is another 20-30 minutes that significant? I'm generally wary of unit tests that are not run continuously and automatically. It delays the detection of problems and then what was probably an obvious fix at the time the change was made becomes a long debugging session (often by someone other than whose change broke things). I think this is especially true given how many people are contributing to hive. On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote: OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman ekoif...@hortonworks.com wrote: These will be new (I.e. 0.11 version) test classes which will be in the old org.apache.hcatalog package. How does that affect the new framework? On Saturday, August 31, 2013, Brock Noland wrote: Will these be new Java class files or new test methods to existing classes? I am just curious as to how this will play into the distributed testing framework. On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman ekoif...@hortonworks.com wrote: not quite double but close (on my Mac that means it will go up from 35 minutes to 55-60) so in greater scheme of things it should be negligible On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.com wrote: By coverage do you mean to say that: Thus, the published HCatalog JARs will contain both packages and the unit tests will cover both versions of the API. We are going to double the time of unit tests for this module? On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman ekoif...@hortonworks.com wrote: This will change every file under hcatalog so it has to happen before the branching. Most likely at the beginning of next week. Thanks On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Hi, Here is the plan for refactoring HCatalog as was agreed to when it was merged into Hive during. HIVE-4869 is the umbrella bug for this work. The changes are complex and touch every single file under hcatalog. Please comment. When HCatalog project was merged into Hive on 0.11 several integration items did not make the 0.11 deadline. It was agreed to finish them in 0.12 release. Specifically: 1. HIVE-4895 - change package name from org.apache.hcatalog to org.apache.hive.hcatalog 2. HIVE-4896 - create binary backwards compatibility layer for hcat users upgrading from 0.11 to 0.12 For item 1, we’ll just move every file under org.apache.hcatalog to org.apache.hive.hcatalog and update all “package” and “import” statement as well as all hcat/webhcat scripts. This will include all JUnit tests. Item 2 will ensure that if a user has a M/R program or Pig script, etc. that uses HCatalog public API, their programs will continue to work w/o change with hive 0.12. The proposal is to make the changes that have as little impact on the build system, in part to make upcoming ‘mavenization’ of hive easier, in part to make the changes more manageable. The list of
[jira] [Commented] (HIVE-5182) log more stuff via PerfLogger
[ https://issues.apache.org/jira/browse/HIVE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756813#comment-13756813 ] Sergey Shelukhin commented on HIVE-5182: This test is known flaky test, seems to be unrelated log more stuff via PerfLogger - Key: HIVE-5182 URL: https://issues.apache.org/jira/browse/HIVE-5182 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5182.D12639.1.patch PerfLogger output is useful in understanding perf. There are large gaps in it, however, and it's not clear what is going on during these. Some sections are large and have no breakdown. It would be nice to add more stuff. At this point I'm not certain where exactly, whoever makes the patch (me?) will just need to look at the above gaps and fill them in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5197) TestE2EScenerios.createTaskAttempt should use MapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5197: --- Priority: Minor (was: Major) Issue Type: Test (was: Bug) TestE2EScenerios.createTaskAttempt should use MapRedUtil Key: HIVE-5197 URL: https://issues.apache.org/jira/browse/HIVE-5197 Project: Hive Issue Type: Test Reporter: Brock Noland Priority: Minor Basically we should use HCatMapRedUtil as opposed to new'ing the task attempt context. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5129) Multiple table insert fails on count(distinct)
[ https://issues.apache.org/jira/browse/HIVE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-5129: Resolution: Fixed Status: Resolved (was: Patch Available) Multiple table insert fails on count(distinct) -- Key: HIVE-5129 URL: https://issues.apache.org/jira/browse/HIVE-5129 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: aggrTestMultiInsertData1.txt, aggrTestMultiInsertData.txt, aggrTestMultiInsert.q, HIVE-5129.1.patch.txt, HIVE-5129.2.WIP.patch.txt, HIVE-5129.3.patch.txt, HIVE-5129.4.patch, HIVE-5129.4.patch.txt Hive fails with a class cast exception on queries of the form: {noformat} from studenttab10k insert overwrite table multi_insert_2_1 select name, avg(age) as avgage group by name insert overwrite table multi_insert_2_2 select name, age, sum(gpa) as sumgpa group by name, age insert overwrite table multi_insert_2_3 select name, count(distinct age) as distage group by name; {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5137) A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask
[ https://issues.apache.org/jira/browse/HIVE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756845#comment-13756845 ] Edward Capriolo commented on HIVE-5137: --- Ok makes sense. A Hive SQL query should not return a ResultSet when the underlying plan does not include a FetchTask Key: HIVE-5137 URL: https://issues.apache.org/jira/browse/HIVE-5137 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-5137.D12453.1.patch, HIVE-5137.D12453.2.patch, HIVE-5137.D12453.3.patch, HIVE-5137.D12453.4.patch, HIVE-5137.D12453.5.patch, HIVE-5137.D12453.6.patch, HIVE-5137.D12453.7.patch, HIVE-5137.D12453.8.patch Currently, a query like create table if not exists t2 as select * from t1 sets the hasResultSet to true in SQLOperation and in turn, the query returns a result set. However, as a DDL command, this should ideally not return a result set. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5197) TestE2EScenerios.createTaskAttempt should use MapRedUtil
Brock Noland created HIVE-5197: -- Summary: TestE2EScenerios.createTaskAttempt should use MapRedUtil Key: HIVE-5197 URL: https://issues.apache.org/jira/browse/HIVE-5197 Project: Hive Issue Type: Bug Reporter: Brock Noland Basically we should use HCatMapRedUtil as opposed to new'ing the task attempt context. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5060) JDBC driver assumes executeStatement is synchronous
[ https://issues.apache.org/jira/browse/HIVE-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756847#comment-13756847 ] Henry Robinson commented on HIVE-5060: -- [~vgumashta] - sorry about the delay, I've uploaded the patch to review board here: https://reviews.apache.org/r/13948/ The approach in HIVE-4569 will be more general, but this fixes an immediate issue for other implementations of the HS2 API at very little cost to Hive. BTW, I believe the test failures from the patch are unrelated. JDBC driver assumes executeStatement is synchronous --- Key: HIVE-5060 URL: https://issues.apache.org/jira/browse/HIVE-5060 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.11.0 Reporter: Henry Robinson Fix For: 0.11.1, 0.12.0 Attachments: 0001-HIVE-5060-JDBC-driver-assumes-executeStatement-is-sy.patch, HIVE-5060.patch The JDBC driver seems to assume that {{ExecuteStatement}} is a synchronous call when performing updates via {{executeUpdate}}, where the following comment on the RPC in the Thrift file indicates otherwise: {code} // ExecuteStatement() // // Execute a statement. // The returned OperationHandle can be used to check on the // status of the statement, and to fetch results once the // statement has finished executing. {code} I understand that Hive's implementation of {{ExecuteStatement}} is blocking (see https://issues.apache.org/jira/browse/HIVE-4569), but presumably other implementations of the HiveServer2 API (and I'm talking specifically about Impala here, but others might have a similar concern) should be free to return a pollable {{OperationHandle}} per the specification. The JDBC driver's {{executeUpdate}} is as follows: {code} public int executeUpdate(String sql) throws SQLException { execute(sql); return 0; } {code} {{execute(sql)}} discards the {{OperationHandle}} that it gets from the server after determining whether there are results to be fetched. This is problematic for us, because Impala will cancel queries that are running when a session executes, but there's no easy way to be sure that an {{INSERT}} statement has completed before terminating a session on the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5197) TestE2EScenerios.createTaskAttempt should use MapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5197: --- Attachment: HIVE-5197.patch Trivial patch attached. TestE2EScenerios.createTaskAttempt should use MapRedUtil Key: HIVE-5197 URL: https://issues.apache.org/jira/browse/HIVE-5197 Project: Hive Issue Type: Test Reporter: Brock Noland Priority: Minor Attachments: HIVE-5197.patch Basically we should use HCatMapRedUtil as opposed to new'ing the task attempt context. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 13948: JDBC driver assumes executeStatement is synchronous
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13948/ --- Review request for hive. Bugs: HIVE-5060 https://issues.apache.org/jira/browse/HIVE-5060 Repository: hive-git Description --- This patch adds polling after the executeStatement call. In Hive's case, this results in a single extra RPC, for other implementations that may have made executeStatement asynchronous, this allows INSERTs to run to completion before returning from the JDBC-level execute method. Diffs - jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 982ceb8 Diff: https://reviews.apache.org/r/13948/diff/ Testing --- Confirmed that the INSERT problem goes away when run against Cloudera Impala, and that Hive requests see no observable latency penalty. Thanks, Henry Robinson
[jira] [Commented] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)
[ https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756817#comment-13756817 ] Sergey Shelukhin commented on HIVE-4914: somehow phabricator created the new review... not sure why filtering via partition name should be done inside metastore server (implementation) Key: HIVE-4914 URL: https://issues.apache.org/jira/browse/HIVE-4914 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-4914.01.patch, HIVE-4914.02.patch, HIVE-4914.D12561.1.patch, HIVE-4914.D12645.1.patch, HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, HIVE-4914.patch, HIVE-4914.patch, HIVE-4914.patch Currently, if the filter pushdown is impossible (which is most cases), the client gets all partition names from metastore, filters them, and asks for partitions by names for the filtered set. Metastore server code should do that instead; it should check if pushdown is possible and do it if so; otherwise it should do name-based filtering. Saves the roundtrip with all partition names from the server to client, and also removes the need to have pushdown viability checking on both sides. NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5129) Multiple table insert fails on count(distinct)
[ https://issues.apache.org/jira/browse/HIVE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5129: --- Fix Version/s: 0.12.0 Multiple table insert fails on count(distinct) -- Key: HIVE-5129 URL: https://issues.apache.org/jira/browse/HIVE-5129 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.12.0 Attachments: aggrTestMultiInsertData1.txt, aggrTestMultiInsertData.txt, aggrTestMultiInsert.q, HIVE-5129.1.patch.txt, HIVE-5129.2.WIP.patch.txt, HIVE-5129.3.patch.txt, HIVE-5129.4.patch, HIVE-5129.4.patch.txt Hive fails with a class cast exception on queries of the form: {noformat} from studenttab10k insert overwrite table multi_insert_2_1 select name, avg(age) as avgage group by name insert overwrite table multi_insert_2_2 select name, age, sum(gpa) as sumgpa group by name, age insert overwrite table multi_insert_2_3 select name, count(distinct age) as distage group by name; {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5197) TestE2EScenerios.createTaskAttempt should use MapRedUtil
[ https://issues.apache.org/jira/browse/HIVE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5197: --- Assignee: Brock Noland Status: Patch Available (was: Open) TestE2EScenerios.createTaskAttempt should use MapRedUtil Key: HIVE-5197 URL: https://issues.apache.org/jira/browse/HIVE-5197 Project: Hive Issue Type: Test Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-5197.patch Basically we should use HCatMapRedUtil as opposed to new'ing the task attempt context. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: RFC: Major HCatalog refactoring
Edward, If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. Actually this is not what we are testing. The package name change (as well as any changes made in 0.12) will be tested by current tests (which will also change package name). The goal of bringing 0.11 version of the source (and corresponding tests) into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs, etc (e.g. a Pig script: A = LOAD 'tablename' USING org.apache.hcatalog.pig.HCatLoader();) will not have to update all the their scripts/programs when upgrading to 0.12. Having 0.11 tests in 0.12 branch ensures that this compatibility layer continues to work while HIve 0.12 and later versions are evolving. On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.comwrote: I would say a main goal of unit and integration testing is to try all code paths. If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. If the current hcatalog tests run in 15 minutes, you make a change and then the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes is a TV show :) As for the overall hive build taking 10-15 hours. I know that :) I used to run them, by hand, on my laptop, because no one would share their build farm with me. I have heard that Hive consumes the vast majority of the resources of apache's build farm! I think we need to be good citizens at apache and attempt to make this better, not worse. Now that we have pre-commit builds we can work at a reasonable pace. Now that we have this nice pre-commit farm, I do not want to create a precedent that now we can go nuts, and start down the same slippery slope. On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Current (sequential) run of all hive/hcat unit tests takes 10-15 hours. Is another 20-30 minutes that significant? I'm generally wary of unit tests that are not run continuously and automatically. It delays the detection of problems and then what was probably an obvious fix at the time the change was made becomes a long debugging session (often by someone other than whose change broke things). I think this is especially true given how many people are contributing to hive. On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote: OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman ekoif...@hortonworks.com wrote: These will be new (I.e. 0.11 version) test classes which will be in the old org.apache.hcatalog package. How does that affect the new framework? On Saturday, August 31, 2013, Brock Noland wrote: Will these be new Java class files or new test methods to existing classes? I am just curious as to how this will play into the distributed testing framework. On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman ekoif...@hortonworks.com wrote: not quite double but close (on my Mac that means it will go up from 35 minutes to 55-60) so in greater scheme of things it should be negligible On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.com wrote: By coverage do you mean to say that: Thus, the published HCatalog JARs will contain both packages and the unit tests will cover both versions of the API. We are going to double the time of unit tests for this module? On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman ekoif...@hortonworks.com wrote: This will change every file under hcatalog so it has to happen before the branching. Most likely at the beginning of next week. Thanks On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Hi, Here is the plan for refactoring HCatalog as was agreed to when it was merged into Hive during. HIVE-4869 is the umbrella bug for this work. The changes are complex and touch every single
[jira] [Updated] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)
[ https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4914: -- Attachment: HIVE-4914.D12561.2.patch sershe updated the revision HIVE-4914 [jira] filtering via partition name should be done inside metastore server (implementation). Try to attach the patch manually Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D12561 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12561?vs=39117id=39441#toc MANIPHEST TASKS https://reviews.facebook.net/T63 AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java metastore/if/hive_metastore.thrift metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java metastore/src/java/org/apache/hadoop/hive/metastore/PartitionExpressionProxy.java metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java metastore/src/test/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java To: JIRA, ashutoshc, sershe filtering via partition name should be done inside metastore server (implementation) Key: HIVE-4914 URL: https://issues.apache.org/jira/browse/HIVE-4914 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-4914.01.patch, HIVE-4914.02.patch, HIVE-4914.D12561.1.patch, HIVE-4914.D12561.2.patch, HIVE-4914.D12645.1.patch, HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, HIVE-4914.patch, HIVE-4914.patch, HIVE-4914.patch Currently, if the filter pushdown is impossible (which is most cases), the client gets all partition names from metastore, filters them, and asks for partitions by names for the filtered set. Metastore server code should do that instead; it should check if pushdown is possible and do it if so; otherwise it should do name-based filtering. Saves the roundtrip with all partition names from the server to client, and also removes the need to have pushdown viability checking on both sides. NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4441) [HCatalog] WebHCat does not honor user home directory
[ https://issues.apache.org/jira/browse/HIVE-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756866#comment-13756866 ] Daniel Dai commented on HIVE-4441: -- [~thejas] DistributedFileSystem.getHomeDirectory() has annoying makeQualified(): {code} return new Path(/user/ + dfs.ugi.getShortUserName()).makeQualified(this); {code} I don't find a hdfs method which can give us the simple form without qualified [~ekoifman] For s3 file system, if user specify statusdir=s3://myoutput, user mean for absolute path. However, s3://myoutput is a relative path as per hdfs (isAbsolute()==false). But we cannot convert it into s3://user//myoutput since s3://user does not belong to the user. So here we skip s3/asv filesystem. e2e tests is included in HIVE-5078 (eg:Pig_9, which we check the location of stdout/stderr/syslog file). Sorry for the confusion. [HCatalog] WebHCat does not honor user home directory - Key: HIVE-4441 URL: https://issues.apache.org/jira/browse/HIVE-4441 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Daniel Dai Attachments: HIVE-4441-1.patch, HIVE-4441-2.patch, HIVE-4441-3.patch If I submit a job as user A and I specify statusdir as a relative path, I would expect results to be stored in the folder relative to the user A's home folder. For example, if I run: {code}curl -s -d user.name=hdinsightuser -d execute=show+tables; -d statusdir=pokes.output 'http://localhost:50111/templeton/v1/hive'{code} I get the results under: {code}/user/hdp/pokes.output{code} And I expect them to be under: {code}/user/hdinsightuser/pokes.output{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756867#comment-13756867 ] Hive QA commented on HIVE-5149: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12601170/HIVE-5149.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 2905 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/592/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/592/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.
[ https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4959: --- Resolution: Fixed Fix Version/s: vectorization-branch Status: Resolved (was: Patch Available) Committed this to trunk. Thanks, Jitendra! Instead of maintaining a static list of vectorizable operator and expressions in Vectorizer class, better way to do this is via adding an annotation on UDF and using that. We should do that in a follow-up jira. Vectorized plan generation should be added as an optimization transform. Key: HIVE-4959 URL: https://issues.apache.org/jira/browse/HIVE-4959 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4959.1.patch, HIVE-4959.2.patch, HIVE-4959.3.patch Currently the query plan is vectorized at the query run time in the map task. It will be much cleaner to add vectorization as an optimization step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5129) Multiple table insert fails on count(distinct)
[ https://issues.apache.org/jira/browse/HIVE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756916#comment-13756916 ] Hudson commented on HIVE-5129: -- FAILURE: Integrated in Hive-trunk-hadoop2 #401 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/401/]) HIVE-5129 Multiple table insert fails on count distinct (Vikram Dixit via Harish Butani) (rhbutani: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519764) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby3.q * /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby3.q.out Multiple table insert fails on count(distinct) -- Key: HIVE-5129 URL: https://issues.apache.org/jira/browse/HIVE-5129 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.12.0 Attachments: aggrTestMultiInsertData1.txt, aggrTestMultiInsertData.txt, aggrTestMultiInsert.q, HIVE-5129.1.patch.txt, HIVE-5129.2.WIP.patch.txt, HIVE-5129.3.patch.txt, HIVE-5129.4.patch, HIVE-5129.4.patch.txt Hive fails with a class cast exception on queries of the form: {noformat} from studenttab10k insert overwrite table multi_insert_2_1 select name, avg(age) as avgage group by name insert overwrite table multi_insert_2_2 select name, age, sum(gpa) as sumgpa group by name, age insert overwrite table multi_insert_2_3 select name, count(distinct age) as distage group by name; {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.
[ https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756886#comment-13756886 ] Ashutosh Chauhan edited comment on HIVE-4959 at 9/3/13 7:05 PM: Committed this to branch. Thanks, Jitendra! Instead of maintaining a static list of vectorizable operator and expressions in Vectorizer class, better way to do this is via adding an annotation on UDF and using that. We should do that in a follow-up jira. was (Author: ashutoshc): Committed this to trunk. Thanks, Jitendra! Instead of maintaining a static list of vectorizable operator and expressions in Vectorizer class, better way to do this is via adding an annotation on UDF and using that. We should do that in a follow-up jira. Vectorized plan generation should be added as an optimization transform. Key: HIVE-4959 URL: https://issues.apache.org/jira/browse/HIVE-4959 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4959.1.patch, HIVE-4959.2.patch, HIVE-4959.3.patch Currently the query plan is vectorized at the query run time in the map task. It will be much cleaner to add vectorization as an optimization step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.
[ https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5152: --- Resolution: Fixed Fix Version/s: vectorization-branch Status: Resolved (was: Patch Available) Committed to branch. Thanks, Jitendra! Vector operators should inherit from non-vector operators for code re-use. -- Key: HIVE-5152 URL: https://issues.apache.org/jira/browse/HIVE-5152 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-5152.1.patch In many cases vectorized operators could share code from non-vector operators by inheriting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.
[ https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756920#comment-13756920 ] Ashutosh Chauhan commented on HIVE-5152: For completeness, shall we make VectorFileSinkOp extend from FileSinkOp. Or is that not worthwhile? Vector operators should inherit from non-vector operators for code re-use. -- Key: HIVE-5152 URL: https://issues.apache.org/jira/browse/HIVE-5152 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5152.1.patch In many cases vectorized operators could share code from non-vector operators by inheriting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5131) JDBC client's hive variables are not passed to HS2
[ https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756933#comment-13756933 ] Thejas M Nair commented on HIVE-5131: - The changes look good. For the unit test, instead of making change to the url used by all tests in TestBeeLineWithArgs, can you change it so that the url can be customized per test ? Maybe something like this - {code} final static String BASE_JDBC_URL = BeeLine.BEELINE_DEFAULT_JDBC_URL + localhost:1 // set JDBC_URL to something else in test case, if it needs to be customized String JDBC_URL = BASE_JDBC_URL; // Use JDBC_URL to connect {code} JDBC client's hive variables are not passed to HS2 -- Key: HIVE-5131 URL: https://issues.apache.org/jira/browse/HIVE-5131 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-5131.patch, HIVE-5131.patch Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC clients suffer the same problem. This was identified in HIVE-4568. I decided it might be better to separate issue from a different issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5131) JDBC client's hive variables are not passed to HS2
[ https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5131: Status: Open (was: Patch Available) JDBC client's hive variables are not passed to HS2 -- Key: HIVE-5131 URL: https://issues.apache.org/jira/browse/HIVE-5131 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-5131.patch, HIVE-5131.patch Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC clients suffer the same problem. This was identified in HIVE-4568. I decided it might be better to separate issue from a different issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: RFC: Major HCatalog refactoring
I understand. Can we do something like this? oldpackage.HCatologLoader extends newpackage.HCatlogloader { } If we do something like this we don't need to test both classes, it is safe to assume they both do the same thing. I understand that we do not want users to have to specify a new class name, but 15 minutes of unit tests around a re-name is overkill. On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman ekoif...@hortonworks.comwrote: Edward, If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. Actually this is not what we are testing. The package name change (as well as any changes made in 0.12) will be tested by current tests (which will also change package name). The goal of bringing 0.11 version of the source (and corresponding tests) into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs, etc (e.g. a Pig script: A = LOAD 'tablename' USING org.apache.hcatalog.pig.HCatLoader();) will not have to update all the their scripts/programs when upgrading to 0.12. Having 0.11 tests in 0.12 branch ensures that this compatibility layer continues to work while HIve 0.12 and later versions are evolving. On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I would say a main goal of unit and integration testing is to try all code paths. If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. If the current hcatalog tests run in 15 minutes, you make a change and then the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes is a TV show :) As for the overall hive build taking 10-15 hours. I know that :) I used to run them, by hand, on my laptop, because no one would share their build farm with me. I have heard that Hive consumes the vast majority of the resources of apache's build farm! I think we need to be good citizens at apache and attempt to make this better, not worse. Now that we have pre-commit builds we can work at a reasonable pace. Now that we have this nice pre-commit farm, I do not want to create a precedent that now we can go nuts, and start down the same slippery slope. On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Current (sequential) run of all hive/hcat unit tests takes 10-15 hours. Is another 20-30 minutes that significant? I'm generally wary of unit tests that are not run continuously and automatically. It delays the detection of problems and then what was probably an obvious fix at the time the change was made becomes a long debugging session (often by someone other than whose change broke things). I think this is especially true given how many people are contributing to hive. On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote: OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman ekoif...@hortonworks.com wrote: These will be new (I.e. 0.11 version) test classes which will be in the old org.apache.hcatalog package. How does that affect the new framework? On Saturday, August 31, 2013, Brock Noland wrote: Will these be new Java class files or new test methods to existing classes? I am just curious as to how this will play into the distributed testing framework. On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman ekoif...@hortonworks.com wrote: not quite double but close (on my Mac that means it will go up from 35 minutes to 55-60) so in greater scheme of things it should be negligible On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.com wrote: By coverage do you mean to say that: Thus, the published HCatalog JARs will contain both packages and the unit tests will cover both versions of the API. We are going to double the time of unit tests for this module? On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman
[jira] [Commented] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.
[ https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756921#comment-13756921 ] Ashutosh Chauhan commented on HIVE-5152: Oh sorry missed that one. VectorFS already extends from FS. Vector operators should inherit from non-vector operators for code re-use. -- Key: HIVE-5152 URL: https://issues.apache.org/jira/browse/HIVE-5152 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5152.1.patch In many cases vectorized operators could share code from non-vector operators by inheriting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5131) JDBC client's hive variables are not passed to HS2
[ https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756936#comment-13756936 ] Thejas M Nair commented on HIVE-5131: - While you are at it, can you also make a minor change to javadoc of TestBeeLineWithArgs.testScriptFile ? Change @param expecttedPattern Text to look for in command output to @param expecttedPattern Text to look for in command output/error JDBC client's hive variables are not passed to HS2 -- Key: HIVE-5131 URL: https://issues.apache.org/jira/browse/HIVE-5131 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-5131.patch, HIVE-5131.patch Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC clients suffer the same problem. This was identified in HIVE-4568. I decided it might be better to separate issue from a different issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4441) [HCatalog] WebHCat does not honor user home directory
[ https://issues.apache.org/jira/browse/HIVE-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756894#comment-13756894 ] Eugene Koifman commented on HIVE-4441: -- I think this needs to be in the JavaDoc and at least a debug level log statement needs to be added to indicate where the data is actually written. [HCatalog] WebHCat does not honor user home directory - Key: HIVE-4441 URL: https://issues.apache.org/jira/browse/HIVE-4441 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Daniel Dai Attachments: HIVE-4441-1.patch, HIVE-4441-2.patch, HIVE-4441-3.patch If I submit a job as user A and I specify statusdir as a relative path, I would expect results to be stored in the folder relative to the user A's home folder. For example, if I run: {code}curl -s -d user.name=hdinsightuser -d execute=show+tables; -d statusdir=pokes.output 'http://localhost:50111/templeton/v1/hive'{code} I get the results under: {code}/user/hdp/pokes.output{code} And I expect them to be under: {code}/user/hdinsightuser/pokes.output{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5113) webhcat should allow configuring memory used by templetoncontroller map job in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756941#comment-13756941 ] Eugene Koifman commented on HIVE-5113: -- [~thejas] Some comments: 1. can templeton.controller.map.mem be named mapreduce.map.memory.mb, i.e. the same as the Hadoop prop. I think it is easier to follow if prop are not renamed. Or at least webhcat(templeton).mapreduce.map.memory.mb if you are worried about namespace collisions. (same for other new props) 2. The description in webhcat-default.xml: Could it say that this is a Hadoop option and is passed directly/as is to Templeton Controller MR job. (In some cases default is specified as '512' and in others '-Xmx300m'). It would tell users where to go get more doc that explains what these props do. 3. templeton.controller.mr.child.opts is defined in the code but now webhcat-default.xml. Is that intentional? 4. public static final String HADOOP_MAP_JAVA_OPTS = mapreduce.map.java.opts; public static final String HADOOP_MAP_MEMORY = mapreduce.map.memory.mb; public static final String HADOOP_AM_MEMORY = yarn.app.mapreduce.am.resource.mb; public static final String HADOOP_AM_JAVA_OPTS = yarn.app.mapreduce.am.command-ops; Are there symbolic constants in Hadoop code base for these? Can they be used here? webhcat should allow configuring memory used by templetoncontroller map job in hadoop2 -- Key: HIVE-5113 URL: https://issues.apache.org/jira/browse/HIVE-5113 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5113.1.patch Webhcat should allow the following hadoop2 config parameters to be set the templetoncontroller map-only job that actually runs the pig/hive/mr command. mapreduce.map.memory.mb yarn.app.mapreduce.am.resource.mb yarn.app.mapreduce.am.command-opts It should also be set to reasonable defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756943#comment-13756943 ] Xuefu Zhang commented on HIVE-3976: --- [~jdere] Thanks for posting your code regarding precision/scale, and your comments about related UDFs. There seems a lot of work, but we hope the outcome is worth the effort. It's good that you have gained insight with your char/varchar work. It will be valuable. [~hagleitn] Thanks for sharing your thoughts. I agree that this is complex enough to have a spec, with which, the issue may close quicker and easier due to a large community. The questions you posted are valid and yet to be answered. Existing decimal feature seems incomplete and non-standard in many ways. With this task, we can hope to put it in a good shape. In principal, I think we should follow standard if available, and follow some implementation or have hive's own implementation when standard is not defined. Doing that seems making it unavoidable to break backward compatibility. But how much we can break. For instance, can we say that a decimal without precision and scale specified defaults to (10, 0) (as mysql does) rather than the current (38, ?). It's great if we can redefine everything and do it right, once for all. Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: Improvement Components: Query Processor, Types Reporter: Mark Grover Assignee: Xuefu Zhang Attachments: remove_prec_scale.diff HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: RFC: Major HCatalog refactoring
You may have already said this but remind me again. If we go with this approach, how long until we retired the duplicated code and insist end users use the new name? 1 release? A similar debate is likely why the hive classes are still packaged as org.apache.hadoop.hive, rather then org.apache.hive. On Tue, Sep 3, 2013 at 2:54 PM, Eugene Koifman ekoif...@hortonworks.comwrote: We explored the idea you suggest and given the number of APIs (and their transitive closure) it would would be very difficult and the result would be fragile. So unfortunately that is not possible. For example, oldpackage.A has a method foo() that returns oldpackage.B. You could create newpackage.A extends oldpackage.A { @Override newpacage.B foo() { } } which works because of covariant return type, but the implementation of foo() becomes problematic because it itself uses other classes. On Tue, Sep 3, 2013 at 11:41 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I understand. Can we do something like this? oldpackage.HCatologLoader extends newpackage.HCatlogloader { } If we do something like this we don't need to test both classes, it is safe to assume they both do the same thing. I understand that we do not want users to have to specify a new class name, but 15 minutes of unit tests around a re-name is overkill. On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Edward, If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. Actually this is not what we are testing. The package name change (as well as any changes made in 0.12) will be tested by current tests (which will also change package name). The goal of bringing 0.11 version of the source (and corresponding tests) into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs, etc (e.g. a Pig script: A = LOAD 'tablename' USING org.apache.hcatalog.pig.HCatLoader();) will not have to update all the their scripts/programs when upgrading to 0.12. Having 0.11 tests in 0.12 branch ensures that this compatibility layer continues to work while HIve 0.12 and later versions are evolving. On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I would say a main goal of unit and integration testing is to try all code paths. If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. If the current hcatalog tests run in 15 minutes, you make a change and then the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes is a TV show :) As for the overall hive build taking 10-15 hours. I know that :) I used to run them, by hand, on my laptop, because no one would share their build farm with me. I have heard that Hive consumes the vast majority of the resources of apache's build farm! I think we need to be good citizens at apache and attempt to make this better, not worse. Now that we have pre-commit builds we can work at a reasonable pace. Now that we have this nice pre-commit farm, I do not want to create a precedent that now we can go nuts, and start down the same slippery slope. On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Current (sequential) run of all hive/hcat unit tests takes 10-15 hours. Is another 20-30 minutes that significant? I'm generally wary of unit tests that are not run continuously and automatically. It delays the detection of problems and then what was probably an obvious fix at the time the change was made becomes a long debugging session (often by someone other than whose change broke things). I think this is especially true given how many people are contributing to hive. On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote: OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On
Re: RFC: Major HCatalog refactoring
There is already a few things in hcat that I would argue are things we do not need to test. Example: testAMQListener I would say it is a bit out of scope to have a component like this. Could we mock the message queue, do we need these dependencies in project? I can not even fathom how long the tests will run post vectorization, post tez? Maybe I am the only one who worries about these things. On Tue, Sep 3, 2013 at 2:41 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I understand. Can we do something like this? oldpackage.HCatologLoader extends newpackage.HCatlogloader { } If we do something like this we don't need to test both classes, it is safe to assume they both do the same thing. I understand that we do not want users to have to specify a new class name, but 15 minutes of unit tests around a re-name is overkill. On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman ekoif...@hortonworks.comwrote: Edward, If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. Actually this is not what we are testing. The package name change (as well as any changes made in 0.12) will be tested by current tests (which will also change package name). The goal of bringing 0.11 version of the source (and corresponding tests) into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs, etc (e.g. a Pig script: A = LOAD 'tablename' USING org.apache.hcatalog.pig.HCatLoader();) will not have to update all the their scripts/programs when upgrading to 0.12. Having 0.11 tests in 0.12 branch ensures that this compatibility layer continues to work while HIve 0.12 and later versions are evolving. On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I would say a main goal of unit and integration testing is to try all code paths. If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. If the current hcatalog tests run in 15 minutes, you make a change and then the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes is a TV show :) As for the overall hive build taking 10-15 hours. I know that :) I used to run them, by hand, on my laptop, because no one would share their build farm with me. I have heard that Hive consumes the vast majority of the resources of apache's build farm! I think we need to be good citizens at apache and attempt to make this better, not worse. Now that we have pre-commit builds we can work at a reasonable pace. Now that we have this nice pre-commit farm, I do not want to create a precedent that now we can go nuts, and start down the same slippery slope. On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Current (sequential) run of all hive/hcat unit tests takes 10-15 hours. Is another 20-30 minutes that significant? I'm generally wary of unit tests that are not run continuously and automatically. It delays the detection of problems and then what was probably an obvious fix at the time the change was made becomes a long debugging session (often by someone other than whose change broke things). I think this is especially true given how many people are contributing to hive. On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote: OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman ekoif...@hortonworks.com wrote: These will be new (I.e. 0.11 version) test classes which will be in the old org.apache.hcatalog package. How does that affect the new framework? On Saturday, August 31, 2013, Brock Noland wrote: Will these be new Java class files or new test methods to existing classes? I am just curious as to how this will play into the distributed testing framework. On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman ekoif...@hortonworks.com wrote: not quite double but close (on my Mac that means it will go up from 35 minutes to 55-60) so in greater scheme of things it
Re: RFC: Major HCatalog refactoring
We explored the idea you suggest and given the number of APIs (and their transitive closure) it would would be very difficult and the result would be fragile. So unfortunately that is not possible. For example, oldpackage.A has a method foo() that returns oldpackage.B. You could create newpackage.A extends oldpackage.A { @Override newpacage.B foo() { } } which works because of covariant return type, but the implementation of foo() becomes problematic because it itself uses other classes. On Tue, Sep 3, 2013 at 11:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: I understand. Can we do something like this? oldpackage.HCatologLoader extends newpackage.HCatlogloader { } If we do something like this we don't need to test both classes, it is safe to assume they both do the same thing. I understand that we do not want users to have to specify a new class name, but 15 minutes of unit tests around a re-name is overkill. On Tue, Sep 3, 2013 at 2:13 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Edward, If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. Actually this is not what we are testing. The package name change (as well as any changes made in 0.12) will be tested by current tests (which will also change package name). The goal of bringing 0.11 version of the source (and corresponding tests) into 0.12 is to ensure that users who use HCatalog from scripts/MR jobs, etc (e.g. a Pig script: A = LOAD 'tablename' USING org.apache.hcatalog.pig.HCatLoader();) will not have to update all the their scripts/programs when upgrading to 0.12. Having 0.11 tests in 0.12 branch ensures that this compatibility layer continues to work while HIve 0.12 and later versions are evolving. On Tue, Sep 3, 2013 at 10:22 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I would say a main goal of unit and integration testing is to try all code paths. If a testing framework is truly testing all code paths twice, there is not much of a win there from a unit/integration tests standpoint. If the unit tests created more coverage of the code that would be an obvious win. I have not looked at your patch but from your description it sounds like we are attempting to test a rename that does not sound like a win to me. If the current hcatalog tests run in 15 minutes, you make a change and then the run is 30 minutes. 15 minutes is a nice long coffee break, 30 minutes is a TV show :) As for the overall hive build taking 10-15 hours. I know that :) I used to run them, by hand, on my laptop, because no one would share their build farm with me. I have heard that Hive consumes the vast majority of the resources of apache's build farm! I think we need to be good citizens at apache and attempt to make this better, not worse. Now that we have pre-commit builds we can work at a reasonable pace. Now that we have this nice pre-commit farm, I do not want to create a precedent that now we can go nuts, and start down the same slippery slope. On Tue, Sep 3, 2013 at 12:57 PM, Eugene Koifman ekoif...@hortonworks.com wrote: Current (sequential) run of all hive/hcat unit tests takes 10-15 hours. Is another 20-30 minutes that significant? I'm generally wary of unit tests that are not run continuously and automatically. It delays the detection of problems and then what was probably an obvious fix at the time the change was made becomes a long debugging session (often by someone other than whose change broke things). I think this is especially true given how many people are contributing to hive. On Tue, Sep 3, 2013 at 7:25 AM, Brock Noland br...@cloudera.com wrote: OK that should be fine. Though I would echo Edwards sentiment about adding so much test time. Do these tests have to run each time? Does it make sense to have an test target such as test-all-hcatalog and then have then run them periodically manually, especially before releases? On Mon, Sep 2, 2013 at 10:36 AM, Eugene Koifman ekoif...@hortonworks.com wrote: These will be new (I.e. 0.11 version) test classes which will be in the old org.apache.hcatalog package. How does that affect the new framework? On Saturday, August 31, 2013, Brock Noland wrote: Will these be new Java class files or new test methods to existing classes? I am just curious as to how this will play into the distributed testing framework. On Sat, Aug 31, 2013
[jira] [Commented] (HIVE-5129) Multiple table insert fails on count(distinct)
[ https://issues.apache.org/jira/browse/HIVE-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756812#comment-13756812 ] Harish Butani commented on HIVE-5129: - Committed to trunk. Thanks, Vikram! Multiple table insert fails on count(distinct) -- Key: HIVE-5129 URL: https://issues.apache.org/jira/browse/HIVE-5129 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: aggrTestMultiInsertData1.txt, aggrTestMultiInsertData.txt, aggrTestMultiInsert.q, HIVE-5129.1.patch.txt, HIVE-5129.2.WIP.patch.txt, HIVE-5129.3.patch.txt, HIVE-5129.4.patch, HIVE-5129.4.patch.txt Hive fails with a class cast exception on queries of the form: {noformat} from studenttab10k insert overwrite table multi_insert_2_1 select name, avg(age) as avgage group by name insert overwrite table multi_insert_2_2 select name, age, sum(gpa) as sumgpa group by name, age insert overwrite table multi_insert_2_3 select name, count(distinct age) as distage group by name; {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5149: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Yin! groupby2.q failed because .q.out needs update, which I did. TestMTQueries didn't fail for me and looks flaky. That test seems inordinately long time (30 mins) to execute and seems like spend all its time waiting to release locks in ZK. There is some threading issue going on there. Needs some investigation. ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4586) [HCatalog] WebHCat should return 404 error for undefined resource
[ https://issues.apache.org/jira/browse/HIVE-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved HIVE-4586. -- Resolution: Fixed Hadoop Flags: Reviewed Patch committed to trunk. [HCatalog] WebHCat should return 404 error for undefined resource - Key: HIVE-4586 URL: https://issues.apache.org/jira/browse/HIVE-4586 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-4586-1.patch, HIVE-4586-2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)
Eugene Koifman created HIVE-5198: Summary: WebHCat returns exitcode 143 (w/o an explanation) Key: HIVE-5198 URL: https://issues.apache.org/jira/browse/HIVE-5198 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.11.0 Reporter: Eugene Koifman Filing this bug mostly to help anyone trying to decipher 143 error code which does not appear in the source code. In 0.12 error reporting was improved and this reports a stacktrace. This error code means that Metastore client could not connect to the metastore. This is likely a config issue with hive.metastore.uris not being set. The message might look like this: {statement:use default; show table extended like xyz;,error:unable to show table: xyz,exec:{stdout:,stderr:,exitcode:143}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756969#comment-13756969 ] Yin Huai commented on HIVE-5149: groupby2.q is also used in TestMTQueries. Probably the failure of TestMTQueries was also caused groupby2. ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5186) Remove JoinReducerProc from ReduceSinkDeDuplication
[ https://issues.apache.org/jira/browse/HIVE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756973#comment-13756973 ] Ashutosh Chauhan commented on HIVE-5186: I am not sure we want this. Wherever possible RSDeDup should compact the query plan (since it avoids adding new operators in piepline). Only cases which can not be handled by RSDeDup, Correlation Optimizer should kick in. In this patch, it seems like you are removing a case which can be handled by RSDeDup. Remove JoinReducerProc from ReduceSinkDeDuplication --- Key: HIVE-5186 URL: https://issues.apache.org/jira/browse/HIVE-5186 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Yin Huai Priority: Minor Attachments: HIVE-5186.1.patch.txt Correlation Optimizer will take care patterns involving JoinOperator. We can remove JoinReducerProc from ReduceSinkDeDuplication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756975#comment-13756975 ] Ashutosh Chauhan commented on HIVE-5149: Right. Yeah, likely that was the reason for TestMTQueries failures. ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Priority: Blocker Fix For: 0.12.0 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch, HIVE-5149.3.patch https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5186) Remove JoinReducerProc from ReduceSinkDeDuplication
[ https://issues.apache.org/jira/browse/HIVE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5186: --- Status: Open (was: Patch Available) Remove JoinReducerProc from ReduceSinkDeDuplication --- Key: HIVE-5186 URL: https://issues.apache.org/jira/browse/HIVE-5186 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Yin Huai Priority: Minor Attachments: HIVE-5186.1.patch.txt Correlation Optimizer will take care patterns involving JoinOperator. We can remove JoinReducerProc from ReduceSinkDeDuplication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5112) Upgrade protobuf to 2.5 from 2.4
[ https://issues.apache.org/jira/browse/HIVE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757012#comment-13757012 ] Hive QA commented on HIVE-5112: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12601174/HIVE-5112.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2906 tests executed *Failed tests:* {noformat} org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/593/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/593/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Upgrade protobuf to 2.5 from 2.4 Key: HIVE-5112 URL: https://issues.apache.org/jira/browse/HIVE-5112 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Owen O'Malley Attachments: HIVE-5112.2.patch, HIVE-5112.D12429.1.patch Hadoop and Hbase have both upgraded protobuf. We should as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5113) webhcat should allow configuring memory used by templetoncontroller map job in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756978#comment-13756978 ] Thejas M Nair commented on HIVE-5113: - canceling patch while comments are addressed. webhcat should allow configuring memory used by templetoncontroller map job in hadoop2 -- Key: HIVE-5113 URL: https://issues.apache.org/jira/browse/HIVE-5113 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5113.1.patch Webhcat should allow the following hadoop2 config parameters to be set the templetoncontroller map-only job that actually runs the pig/hive/mr command. mapreduce.map.memory.mb yarn.app.mapreduce.am.resource.mb yarn.app.mapreduce.am.command-opts It should also be set to reasonable defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call
[ https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-4442: - Attachment: HIVE-4443-3.patch [~ekoifman] That's fine, I don't have a preference with API. I also borrow the code from other parts of Templeton. Reattach patch with API change and resync with trunk. [HCatalog] WebHCat should not override user.name parameter for Queue call - Key: HIVE-4442 URL: https://issues.apache.org/jira/browse/HIVE-4442 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Daniel Dai Attachments: HIVE-4442-1.patch, HIVE-4442-2.patch, HIVE-4443-3.patch Currently templeton for the Queue call uses the user.name to filter the results of the call in addition to the default security. Ideally the filter is an optional parameter to the call independent of the security check. I would suggest a parameter in addition to GET queue (jobs) give you all the jobs a user have permission: GET queue?showall=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5131) JDBC client's hive variables are not passed to HS2
[ https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-5131: -- Attachment: HIVE-5131.1.patch Patch is updated based on review comments. JDBC client's hive variables are not passed to HS2 -- Key: HIVE-5131 URL: https://issues.apache.org/jira/browse/HIVE-5131 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-5131.1.patch, HIVE-5131.patch, HIVE-5131.patch Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC clients suffer the same problem. This was identified in HIVE-4568. I decided it might be better to separate issue from a different issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5187) Enhance explain to indicate vectorized execution of operators.
[ https://issues.apache.org/jira/browse/HIVE-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5187: --- Resolution: Fixed Fix Version/s: vectorization-branch Status: Resolved (was: Patch Available) Committed to branch. Thanks, Jitendra! Enhance explain to indicate vectorized execution of operators. -- Key: HIVE-5187 URL: https://issues.apache.org/jira/browse/HIVE-5187 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-5187.1.patch Explain should be able to indicate whether an operator will be executed in vectorized mode or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call
[ https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-4442: - Attachment: HIVE-4442-3.patch [HCatalog] WebHCat should not override user.name parameter for Queue call - Key: HIVE-4442 URL: https://issues.apache.org/jira/browse/HIVE-4442 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Daniel Dai Attachments: HIVE-4442-1.patch, HIVE-4442-2.patch, HIVE-4442-3.patch Currently templeton for the Queue call uses the user.name to filter the results of the call in addition to the default security. Ideally the filter is an optional parameter to the call independent of the security check. I would suggest a parameter in addition to GET queue (jobs) give you all the jobs a user have permission: GET queue?showall=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call
[ https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-4442: - Attachment: (was: HIVE-4443-3.patch) [HCatalog] WebHCat should not override user.name parameter for Queue call - Key: HIVE-4442 URL: https://issues.apache.org/jira/browse/HIVE-4442 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Daniel Dai Attachments: HIVE-4442-1.patch, HIVE-4442-2.patch, HIVE-4442-3.patch Currently templeton for the Queue call uses the user.name to filter the results of the call in addition to the default security. Ideally the filter is an optional parameter to the call independent of the security check. I would suggest a parameter in addition to GET queue (jobs) give you all the jobs a user have permission: GET queue?showall=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode
[ https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757019#comment-13757019 ] Phabricator commented on HIVE-4617: --- thejas has commented on the revision HIVE-4617 [jira] ExecuteStatementAsync call to run a query in non-blocking mode. INLINE COMMENTS conf/hive-default.xml.template:1851 the value here also needs to be updated to 500 service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java:144 The client is not getting the error details in case of async exec code path. We need to address that. service/src/java/org/apache/hive/service/cli/session/SessionManager.java:59 Do you plan to make the change to new thread pool as part of this jira ? If not, you might want to set the default number of async threads to be lower, and increase it as part of the new thread pool change. REVISION DETAIL https://reviews.facebook.net/D12507 To: JIRA, vaibhavgumashta Cc: cwsteinbach, thejas ExecuteStatementAsync call to run a query in non-blocking mode -- Key: HIVE-4617 URL: https://issues.apache.org/jira/browse/HIVE-4617 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Jaideep Dhok Assignee: Vaibhav Gumashta Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, HIVE-4617.D12507.2.patch, HIVE-4617.D12507Test.1.patch Provide a way to run a queries asynchronously. Current executeStatement call blocks until the query run is complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5186) Remove JoinReducerProc from ReduceSinkDeDuplication
[ https://issues.apache.org/jira/browse/HIVE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757010#comment-13757010 ] Yin Huai commented on HIVE-5186: Right, I am removing cases associated with the pattern JOIN%.*%RS%. Here are my reasons. 1. Seems JoinReducerProc will kick in when hive.auto.convert.join=false and hive.auto.convert.join.noconditionaltask=false. Because of these conditions, I thought maybe it is hard to trigger this part of code in practice. 2. For some cases, I am not sure if we can generate executable plan when both JoinReducerProc and Correlation Optimizer fire. For example, {code} JOIN3 /\ / \ /\ GBY1 GBY2 | | | | JOIN1 JOIN2 {code} If all of these five operators share the same key, JoinReducerProc will drop the RS between JOIN1 and GBY1, and drop the RS between JOIN2 and GBY2. Then, Correlation Optimizer will try to drop the RSs associated with JOIN3. Since there is no Mux between JOIN1 and GBY1, and between JOIN2 and GBY2, I am not sure if the plan is executable. But I have not tried this case. Will give it a try and post what I find. Remove JoinReducerProc from ReduceSinkDeDuplication --- Key: HIVE-5186 URL: https://issues.apache.org/jira/browse/HIVE-5186 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Yin Huai Priority: Minor Attachments: HIVE-5186.1.patch.txt Correlation Optimizer will take care patterns involving JoinOperator. We can remove JoinReducerProc from ReduceSinkDeDuplication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5104) HCatStorer fails to store boolean type
[ https://issues.apache.org/jira/browse/HIVE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756979#comment-13756979 ] Sushanth Sowmyan commented on HIVE-5104: Actually, on manual application, the whitespace errors are easy enough to fix. I'm uploading a whitespace-corrected version of the patch and will try to get this patch in before the freeze if tests pass. HCatStorer fails to store boolean type -- Key: HIVE-5104 URL: https://issues.apache.org/jira/browse/HIVE-5104 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Ron Frohock Attachments: HIVE-5104.1.patch.txt Unable to store boolean values to HCat table Assume in Hive you have two tables... CREATE TABLE btest(test as boolean); CREATE TABLE btest2(test as boolean); Then in Pig A = LOAD 'btest' USING org.apache.hcatalog.pig.HCatLoader(); STORE A INTO 'btest2' USING org.apache.hcatalog.pig.HCatStorer(); You will get an ERROR 115: Unsupported type 5: in Pig's Schema Checking HCatBaseStorer.java, the case for data types doesn't check for booleans. Might have been overlooked in adding boolean to Pig in 0.10 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4442) [HCatalog] WebHCat should not override user.name parameter for Queue call
[ https://issues.apache.org/jira/browse/HIVE-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757061#comment-13757061 ] Eugene Koifman commented on HIVE-4442: -- The point is that UgiFactory creates a proxy user with proper credentials, while UserGroupInformation.createRemoteUser() works in simple security mode... Generally, in WebHCat a param user is determined by Server#getDoAsUser(). If doAs is specified, the user=doAs, otherwise it's the user making the call. In the HIVE-4442.3.patch StatusDelegator uses UgiFactory to get UserGroupInformation but the other 2 use UserGroupInformation.createRemoteUser(). So from a security point of view I think Delete/List/StatusDelegator should all use UgiFactory with user as argument. UserGroupInformation.getLoginUser() will return the user running WebHCat (hcat by default). [HCatalog] WebHCat should not override user.name parameter for Queue call - Key: HIVE-4442 URL: https://issues.apache.org/jira/browse/HIVE-4442 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Daniel Dai Attachments: HIVE-4442-1.patch, HIVE-4442-2.patch, HIVE-4442-3.patch Currently templeton for the Queue call uses the user.name to filter the results of the call in addition to the default security. Ideally the filter is an optional parameter to the call independent of the security check. I would suggest a parameter in addition to GET queue (jobs) give you all the jobs a user have permission: GET queue?showall=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5199) Read Only Custom SerDe works with HDP 1.1 but not with HDP 1.3
Hari Sankar Sivarama Subramaniyan created HIVE-5199: --- Summary: Read Only Custom SerDe works with HDP 1.1 but not with HDP 1.3 Key: HIVE-5199 URL: https://issues.apache.org/jira/browse/HIVE-5199 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Custom serdes which used to work in HDP 1.1 is no longer working with HDP 1.3. The issue happens when the partition serde is not of settable type in HDP 1.3. The below exception happens via FetchOperator as well as MapOperator. Inside FetchOperator consider the following call: getRecordReader()-ObjectInspectorConverters. getConverter() The output object inspector is of settable type(because it is generated via ObjectInspectorConverters.getConvertedOI()) where as the input object inspector that gets passed as serde.getObjectorInspector() and is non-settable. Inside getConverter(), the (inputOI.equals(outputOI)) check fails and the switch statement tries to cast the non-settable object inspector to a settable object inspector. The stack trace as follows: 2013-08-28 17:57:25,307 ERROR CliDriver (SessionState.java:printError(432)) - Failed with exception java.io.IOException:java.lang.ClassCastException: com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector java.io.IOException: java.lang.ClassCastException: com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.lang.ClassCastException: com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:307) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:406) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 11334: HIVE-4568 Beeline needs to support resolving variables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11334/#review25859 --- beeline/src/java/org/apache/hive/beeline/BeeLine.java https://reviews.apache.org/r/11334/#comment50420 thats fine beeline/src/java/org/apache/hive/beeline/BeeLine.properties https://reviews.apache.org/r/11334/#comment50421 It is more accurate to say that this is a hive specific variable, as beeline is still a generic tool. beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java https://reviews.apache.org/r/11334/#comment50422 Yes, makes sense to call this hivevariables itself. beeline/src/java/org/apache/hive/beeline/DatabaseConnection.java https://reviews.apache.org/r/11334/#comment50423 This change has not been made in revised patch. - Thejas Nair On Aug. 24, 2013, 8:19 p.m., Xuefu Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11334/ --- (Updated Aug. 24, 2013, 8:19 p.m.) Review request for hive and Ashutosh Chauhan. Bugs: HIVE-4568 https://issues.apache.org/jira/browse/HIVE-4568 Repository: hive-git Description --- 1. Added command variable substition 2. Added test case Diffs - beeline/src/java/org/apache/hive/beeline/BeeLine.java 4c6eb9b beeline/src/java/org/apache/hive/beeline/BeeLine.properties b6650cf beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 61bdeee beeline/src/java/org/apache/hive/beeline/DatabaseConnection.java c70003d beeline/src/test/org/apache/hive/beeline/src/test/TestBeeLineWithArgs.java 030f6b0 Diff: https://reviews.apache.org/r/11334/diff/ Testing --- Thanks, Xuefu Zhang
[jira] [Commented] (HIVE-5107) Change hive's build to maven
[ https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757047#comment-13757047 ] Sergey Shelukhin commented on HIVE-5107: I was thinking about splitting metastore client from server, not just thrift (thrift should be in client too), so that users of metastore wouldn't have to depend on server. In particular, right now metastore server cannot use anything in QL without indirect code, because QL uses metastore client/common bits. Change hive's build to maven Key: HIVE-5107 URL: https://issues.apache.org/jira/browse/HIVE-5107 Project: Hive Issue Type: Task Reporter: Edward Capriolo Assignee: Edward Capriolo I can not cope with hive's build infrastructure any more. I have started working on porting the project to maven. When I have some solid progess i will github the entire thing for review. Then we can talk about switching the project somehow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5096) Add q file tests for ORC predicate pushdown
[ https://issues.apache.org/jira/browse/HIVE-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5096: --- Status: Open (was: Patch Available) There is already over10k file in data/files/ Can you use that one in your tests, instead of adding new one? Add q file tests for ORC predicate pushdown --- Key: HIVE-5096 URL: https://issues.apache.org/jira/browse/HIVE-5096 Project: Hive Issue Type: Test Components: CLI, File Formats, StorageHandler Affects Versions: 0.12.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.12.0 Attachments: HIVE-5096.patch Add q file tests that checks the validity of the results when predicate pushdown is turned on and off. Also test for filter expressions in table scan operator when predicate pushdown is turned on for ORC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5107) Change hive's build to maven
[ https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757049#comment-13757049 ] Sergey Shelukhin commented on HIVE-5107: I figure if there will be disturbance in the build anyway, I can do it right after to not have disturbance for too long :) Change hive's build to maven Key: HIVE-5107 URL: https://issues.apache.org/jira/browse/HIVE-5107 Project: Hive Issue Type: Task Reporter: Edward Capriolo Assignee: Edward Capriolo I can not cope with hive's build infrastructure any more. I have started working on porting the project to maven. When I have some solid progess i will github the entire thing for review. Then we can talk about switching the project somehow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5049) Create an ORC test case that has a 0.11 ORC file
[ https://issues.apache.org/jira/browse/HIVE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757080#comment-13757080 ] Ashutosh Chauhan commented on HIVE-5049: +1 Create an ORC test case that has a 0.11 ORC file Key: HIVE-5049 URL: https://issues.apache.org/jira/browse/HIVE-5049 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Prasanth J Attachments: HIVE-5049.patch.txt, orc-file-11-format.orc We should add a test case that includes a 0.11.0 ORC file to ensure compatibility for reading old ORC files is kept correct. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5131) JDBC client's hive variables are not passed to HS2
[ https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757043#comment-13757043 ] Thejas M Nair commented on HIVE-5131: - [~xuefuz] This was never documented in the doc page. So this is actually adding the new feature. Can you add the documentation for this part url format in the release note of the jira ? Once this is committed it can be moved to the wiki page as an upcoming 0.12 feature. The wiki page - https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients JDBC client's hive variables are not passed to HS2 -- Key: HIVE-5131 URL: https://issues.apache.org/jira/browse/HIVE-5131 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-5131.1.patch, HIVE-5131.patch, HIVE-5131.patch Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC clients suffer the same problem. This was identified in HIVE-4568. I decided it might be better to separate issue from a different issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5014) [HCatalog] Fix HCatalog build issue on Windows
[ https://issues.apache.org/jira/browse/HIVE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757091#comment-13757091 ] Sushanth Sowmyan commented on HIVE-5014: Filename was not set appropriately for jenkins auto-build to pick it up. However, I have manually verified this. +1, committing. [HCatalog] Fix HCatalog build issue on Windows -- Key: HIVE-5014 URL: https://issues.apache.org/jira/browse/HIVE-5014 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-5014-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5157) ReduceSinkDeDuplication ignores hive.groupby.skewindata=true
[ https://issues.apache.org/jira/browse/HIVE-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5157: --- Description: If hive.groupby.skewindata=true, we should generate two MR jobs. But, ReduceSinkDeDuplication will merge these two into a single MR job. Example: groupby2_map_skew.q and groupby2.q (was: If hive.groupby.skewindata=true, we should generate two MR jobs. But, ReduceSinkDeDuplication will merge these two into a single MR job. Example: groupby2_map_skew.q) ReduceSinkDeDuplication ignores hive.groupby.skewindata=true - Key: HIVE-5157 URL: https://issues.apache.org/jira/browse/HIVE-5157 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai If hive.groupby.skewindata=true, we should generate two MR jobs. But, ReduceSinkDeDuplication will merge these two into a single MR job. Example: groupby2_map_skew.q and groupby2.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5014) [HCatalog] Fix HCatalog build issue on Windows
[ https://issues.apache.org/jira/browse/HIVE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5014: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks, Daniel. [HCatalog] Fix HCatalog build issue on Windows -- Key: HIVE-5014 URL: https://issues.apache.org/jira/browse/HIVE-5014 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-5014-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5104) HCatStorer fails to store boolean type
[ https://issues.apache.org/jira/browse/HIVE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756967#comment-13756967 ] Sushanth Sowmyan commented on HIVE-5104: Hi, From looking through the patch, the patch looks good from a functionality perspective. Thank you for the patch. However, it does not seem to apply cleanly on trunk, and also has whitespace issues (trailing whitespaces) and needs regeneration. As part of HIVE-4869, all HCatalog jiras will be frozen for a couple of days as the package renaming effort happening there affects all jiras. Could you please regenerate your patch after that and re-apply? HCatStorer fails to store boolean type -- Key: HIVE-5104 URL: https://issues.apache.org/jira/browse/HIVE-5104 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Ron Frohock Attachments: HIVE-5104.1.patch.txt Unable to store boolean values to HCat table Assume in Hive you have two tables... CREATE TABLE btest(test as boolean); CREATE TABLE btest2(test as boolean); Then in Pig A = LOAD 'btest' USING org.apache.hcatalog.pig.HCatLoader(); STORE A INTO 'btest2' USING org.apache.hcatalog.pig.HCatStorer(); You will get an ERROR 115: Unsupported type 5: in Pig's Schema Checking HCatBaseStorer.java, the case for data types doesn't check for booleans. Might have been overlooked in adding boolean to Pig in 0.10 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5131) JDBC client's hive variables are not passed to HS2
[ https://issues.apache.org/jira/browse/HIVE-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5131: Status: Patch Available (was: Open) +1 . Making it patch available to kick off the tests. JDBC client's hive variables are not passed to HS2 -- Key: HIVE-5131 URL: https://issues.apache.org/jira/browse/HIVE-5131 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-5131.1.patch, HIVE-5131.patch, HIVE-5131.patch Related to HIVE-2914. However, HIVE-2914 seems addressing Hive CLI only. JDBC clients suffer the same problem. This was identified in HIVE-4568. I decided it might be better to separate issue from a different issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-1511: --- Attachment: HIVE-1511.11.patch v11 fixes the hbase tests. Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0, 0.11.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.10.patch, HIVE-1511.11.patch, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, HIVE-1511-wip.patch, KryoHiveTest.java, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5104) HCatStorer fails to store boolean type
[ https://issues.apache.org/jira/browse/HIVE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757028#comment-13757028 ] Sushanth Sowmyan commented on HIVE-5104: One more required change was that null as presented in the modification to the test does not succeed, it needs to be NULL. HCatStorer fails to store boolean type -- Key: HIVE-5104 URL: https://issues.apache.org/jira/browse/HIVE-5104 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Ron Frohock Attachments: HIVE-5104.1.patch.txt, HIVE-5104.2.patch Unable to store boolean values to HCat table Assume in Hive you have two tables... CREATE TABLE btest(test as boolean); CREATE TABLE btest2(test as boolean); Then in Pig A = LOAD 'btest' USING org.apache.hcatalog.pig.HCatLoader(); STORE A INTO 'btest2' USING org.apache.hcatalog.pig.HCatStorer(); You will get an ERROR 115: Unsupported type 5: in Pig's Schema Checking HCatBaseStorer.java, the case for data types doesn't check for booleans. Might have been overlooked in adding boolean to Pig in 0.10 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5104) HCatStorer fails to store boolean type
[ https://issues.apache.org/jira/browse/HIVE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-5104: --- Attachment: HIVE-5104.2.patch HCatStorer fails to store boolean type -- Key: HIVE-5104 URL: https://issues.apache.org/jira/browse/HIVE-5104 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Ron Frohock Attachments: HIVE-5104.1.patch.txt, HIVE-5104.2.patch Unable to store boolean values to HCat table Assume in Hive you have two tables... CREATE TABLE btest(test as boolean); CREATE TABLE btest2(test as boolean); Then in Pig A = LOAD 'btest' USING org.apache.hcatalog.pig.HCatLoader(); STORE A INTO 'btest2' USING org.apache.hcatalog.pig.HCatStorer(); You will get an ERROR 115: Unsupported type 5: in Pig's Schema Checking HCatBaseStorer.java, the case for data types doesn't check for booleans. Might have been overlooked in adding boolean to Pig in 0.10 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
[ https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757092#comment-13757092 ] Ashutosh Chauhan commented on HIVE-4750: +1 Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23 --- Key: HIVE-4750 URL: https://issues.apache.org/jira/browse/HIVE-4750 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-4750.2.patch, HIVE-4750.patch Removing 6,7,8 from the scope of HIVE-4746. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5096) Add q file tests for ORC predicate pushdown
[ https://issues.apache.org/jira/browse/HIVE-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757094#comment-13757094 ] Prasanth J commented on HIVE-5096: -- I just manually added few empty (NULL) values to specific columns for checking against NULL/NOT NULL predicates. Since the minimum index stride is 1000, I kept the total number of rows in the file to 1050 so that there are 2 index strides in the ORC file. If required I can remove few columns that are not being used by the tests and can only maintain the relevant columns. Add q file tests for ORC predicate pushdown --- Key: HIVE-5096 URL: https://issues.apache.org/jira/browse/HIVE-5096 Project: Hive Issue Type: Test Components: CLI, File Formats, StorageHandler Affects Versions: 0.12.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.12.0 Attachments: HIVE-5096.patch Add q file tests that checks the validity of the results when predicate pushdown is turned on and off. Also test for filter expressions in table scan operator when predicate pushdown is turned on for ORC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
[ https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4750: --- Resolution: Fixed Status: Resolved (was: Patch Available) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23 --- Key: HIVE-4750 URL: https://issues.apache.org/jira/browse/HIVE-4750 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-4750.2.patch, HIVE-4750.patch Removing 6,7,8 from the scope of HIVE-4746. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
[ https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757112#comment-13757112 ] Ashutosh Chauhan commented on HIVE-4750: Test 6,7,8 on both mac OS and ubuntu after patch. All 3 passed on both OS. Committed to trunk. Thanks, Prasanth! Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23 --- Key: HIVE-4750 URL: https://issues.apache.org/jira/browse/HIVE-4750 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-4750.2.patch, HIVE-4750.patch Removing 6,7,8 from the scope of HIVE-4746. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira