[jira] [Commented] (HIVE-5221) Issue in colun type with data type as BINARY
[ https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768092#comment-13768092 ] Mohammad Kamrul Islam commented on HIVE-5221: - So you are asking not to decode/encode by default. Give the raw bytes. In this case, we need to just remove the conditional decoding, right? Will it break any backward compatibility? Issue in colun type with data type as BINARY Key: HIVE-5221 URL: https://issues.apache.org/jira/browse/HIVE-5221 Project: Hive Issue Type: Bug Reporter: Arun Vasu Assignee: Mohammad Kamrul Islam Priority: Critical Attachments: HIVE-5221.1.patch Hi, I am using Hive 10. When I create an external table with column type as Binary, the query result on the table is showing some junk values for the column with binary datatype. Please find below the query I have used to create the table: CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/hivetables/testbinary'; The query I have used is : select * from bool1 The sample data in the hdfs file is: 0^a...@abc.com^001 1^a...@abc.com^010 ^a...@abc.com^011 ^a...@abc.com^100 t^a...@abc.com^101 f^a...@abc.com^110 true^a...@abc.com^111 false^a...@abc.com^001 123^^01100010 12344^^0111 Please share your inputs if it is possible. Thanks, Arun -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5276) Skip useless string encoding stage for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5276: -- Attachment: D12879.2.patch navis updated the revision HIVE-5276 [jira] Skip useless string encoding stage for hiveserver2. Fixed test fails addressed comments Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D12879 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12879?vs=39897id=40023#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/Driver.java ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchFormatter.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java To: JIRA, navis Cc: cwsteinbach Skip useless string encoding stage for hiveserver2 -- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: D12879.2.patch, HIVE-5276.D12879.1.patch Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
[ https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768123#comment-13768123 ] Phabricator commented on HIVE-5279: --- ashutoshc has accepted the revision HIVE-5279 [jira] Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc. +1 REVISION DETAIL https://reviews.facebook.net/D12963 BRANCH HIVE-5279 ARCANIST PROJECT hive To: JIRA, ashutoshc, navis Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc --- Key: HIVE-5279 URL: https://issues.apache.org/jira/browse/HIVE-5279 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Priority: Critical Attachments: 5279.patch, D12963.1.patch We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how previous serialization mechanism solved this but, kryo complaints that it's not Serializable and fails the query. The log below is the example, {noformat} java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector Serialization trace: inputOI (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval) genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc) aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc) conf (org.apache.hadoop.hive.ql.exec.GroupByOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.h {noformat} If this cannot be fixed in somehow, some UDAFs should be modified to be run on hive-0.13.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5122) Add partition for multiple partition ignores locations for non-first partitions
[ https://issues.apache.org/jira/browse/HIVE-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5122: -- Attachment: D12411.3.patch navis updated the revision HIVE-5122 [jira] Add partition for multiple partition ignores locations for non-first partitions. Rebased to trunk addressed comment (Path to Location, which is filtered by QTestUtil) Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D12411 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12411?vs=38499id=40029#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ql/src/java/org/apache/hadoop/hive/ql/plan/AddPartitionDesc.java ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java ql/src/test/queries/clientpositive/add_part_exist.q ql/src/test/results/clientpositive/add_part_exist.q.out ql/src/test/results/clientpositive/create_view_partitioned.q.out To: JIRA, navis Add partition for multiple partition ignores locations for non-first partitions --- Key: HIVE-5122 URL: https://issues.apache.org/jira/browse/HIVE-5122 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D12411.3.patch, HIVE-5122.D12411.1.patch, HIVE-5122.D12411.2.patch http://www.mail-archive.com/user@hive.apache.org/msg09151.html When multiple partitions are being added in single alter table statement, the location for first partition is being used as the location of all partitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5276) Skip useless string encoding stage for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768142#comment-13768142 ] Phabricator commented on HIVE-5276: --- cwsteinbach has accepted the revision HIVE-5276 [jira] Skip useless string encoding stage for hiveserver2. +1. Please go ahead and commit this if the tests pass. REVISION DETAIL https://reviews.facebook.net/D12879 BRANCH HIVE-5276 ARCANIST PROJECT hive To: JIRA, cwsteinbach, navis Cc: cwsteinbach Skip useless string encoding stage for hiveserver2 -- Key: HIVE-5276 URL: https://issues.apache.org/jira/browse/HIVE-5276 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: D12879.2.patch, HIVE-5276.D12879.1.patch Current hiveserver2 acquires rows in string format which is used for cli output. Then convert them into row again and convert to final format lastly. This is inefficient and memory consuming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
[ https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5279: --- Assignee: Navis Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc --- Key: HIVE-5279 URL: https://issues.apache.org/jira/browse/HIVE-5279 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Critical Attachments: 5279.patch, D12963.1.patch We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how previous serialization mechanism solved this but, kryo complaints that it's not Serializable and fails the query. The log below is the example, {noformat} java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector Serialization trace: inputOI (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval) genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc) aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc) conf (org.apache.hadoop.hive.ql.exec.GroupByOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.h {noformat} If this cannot be fixed in somehow, some UDAFs should be modified to be run on hive-0.13.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5292) Join on decimal columns fails to return rows
[ https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5292: Status: Patch Available (was: Open) Join on decimal columns fails to return rows Key: HIVE-5292 URL: https://issues.apache.org/jira/browse/HIVE-5292 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: Sergio Lob Assignee: Navis Attachments: D12969.1.patch Join on matching decimal columns returns 0 rows To reproduce (I used beeline): 1. create 2 simple identical tables with 2 identical rows: CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; 2. populate tables with identical data: LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ; LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ; 3. data file decdata contains: 10|.98 20|1234567890.1234 4. Perform join (returns 0 rows instead of 2): SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON T1.D = T2.D ; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5292) Join on decimal columns fails to return rows
[ https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5292: -- Attachment: D12969.1.patch navis requested code review of HIVE-5292 [jira] Join on decimal columns fails to return rows. Reviewers: JIRA HIVE-5292 Join on decimal columns fails to return rows Join on matching decimal columns returns 0 rows To reproduce (I used beeline): 1. create 2 simple identical tables with 2 identical rows: CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; 2. populate tables with identical data: LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ; LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ; 3. data file decdata contains: 10|.98 20|1234567890.1234 4. Perform join (returns 0 rows instead of 2): SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON T1.D = T2.D ; TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12969 AFFECTED FILES common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java ql/src/test/queries/clientpositive/decimal_join.q ql/src/test/results/clientpositive/decimal_join.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/30981/ To: JIRA, navis Join on decimal columns fails to return rows Key: HIVE-5292 URL: https://issues.apache.org/jira/browse/HIVE-5292 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: Sergio Lob Assignee: Navis Attachments: D12969.1.patch Join on matching decimal columns returns 0 rows To reproduce (I used beeline): 1. create 2 simple identical tables with 2 identical rows: CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; 2. populate tables with identical data: LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ; LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ; 3. data file decdata contains: 10|.98 20|1234567890.1234 4. Perform join (returns 0 rows instead of 2): SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON T1.D = T2.D ; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.
Douglas created HIVE-5296: - Summary: Memory leak: OOM Error after multiple open/closed JDBC connections. Key: HIVE-5296 URL: https://issues.apache.org/jira/browse/HIVE-5296 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Environment: Hive 0.12.0, Hadoop 1.1.2, Debian. Reporter: Douglas Fix For: 0.12.0 This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481 However, on inspection of the related patch and my built version of Hive (patch carried forward to 0.12.0), I am still seeing the described behaviour. Multiple connections to Hiveserver2, all of which are closed and disposed of properly show the Java heap size to grow extremely quickly. This issue can be recreated using the following code {code} import java.sql.DriverManager; import java.sql.Connection; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.util.Properties; import org.apache.hive.service.cli.HiveSQLException; import org.apache.log4j.Logger; /* * Class which encapsulates the lifecycle of a query or statement. * Provides functionality which allows you to create a connection */ public class HiveClient { Connection con; Logger logger; private static String driverName = org.apache.hive.jdbc.HiveDriver; private String db; public HiveClient(String db) { logger = Logger.getLogger(HiveClient.class); this.db=db; try{ Class.forName(driverName); }catch(ClassNotFoundException e){ logger.info(Can't find Hive driver); } String hiveHost = GlimmerServer.config.getString(hive/host); String hivePort = GlimmerServer.config.getString(hive/port); String connectionString = jdbc:hive2://+hiveHost+:+hivePort +/default; logger.info(String.format(Attempting to connect to %s,connectionString)); try{ con = DriverManager.getConnection(connectionString,,); }catch(Exception e){ logger.error(Problem instantiating the connection+e.getMessage()); } } public int update(String query) { Integer res = 0; Statement stmt = null; try{ stmt = con.createStatement(); String switchdb = USE +db; logger.info(switchdb); stmt.executeUpdate(switchdb); logger.info(query); res = stmt.executeUpdate(query); logger.info(Query passed to server); stmt.close(); }catch(HiveSQLException e){ logger.info(String.format(HiveSQLException thrown, this can be valid, + but check the error: %s from the query %s,query,e.toString())); }catch(SQLException e){ logger.error(String.format(Unable to execute query SQLException %s. Error: %s,query,e)); }catch(Exception e){ logger.error(String.format(Unable to execute query %s. Error: %s,query,e)); } if(stmt!=null) try{ stmt.close(); }catch(SQLException e){ logger.error(Cannot close the statment, potentially memory leak +e); } return res; } public void close() { if(con!=null){ try { con.close(); } catch (SQLException e) { logger.info(Problem closing connection +e); } } } } {code} And by creating and closing many HiveClient objects. The heap space used by the hiveserver2 runjar process is seen to increase extremely quickly, without such space being released. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Deleting the old Wiki
Google searches still put the old wiki at the top of many results, which is a *very bad thing *for beginners. Can someone delete this wiki or at least redirect to the Confluence wiki? dean -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com
Re: Deleting the old Wiki
Hi, Years ago I helped with the wiki conversion. The page I can find is: http://wiki.apache.org/hadoop/Hive which isn't harmful. What pages are you finding? Brock On Mon, Sep 16, 2013 at 8:51 AM, Dean Wampler deanwamp...@gmail.com wrote: Google searches still put the old wiki at the top of many results, which is a *very bad thing *for beginners. Can someone delete this wiki or at least redirect to the Confluence wiki? dean -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[jira] [Commented] (HIVE-5292) Join on decimal columns fails to return rows
[ https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768398#comment-13768398 ] Phabricator commented on HIVE-5292: --- ashutoshc has accepted the revision HIVE-5292 [jira] Join on decimal columns fails to return rows. +1 REVISION DETAIL https://reviews.facebook.net/D12969 BRANCH HIVE-5292 ARCANIST PROJECT hive To: JIRA, ashutoshc, navis Join on decimal columns fails to return rows Key: HIVE-5292 URL: https://issues.apache.org/jira/browse/HIVE-5292 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: Sergio Lob Assignee: Navis Attachments: D12969.1.patch Join on matching decimal columns returns 0 rows To reproduce (I used beeline): 1. create 2 simple identical tables with 2 identical rows: CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'; 2. populate tables with identical data: LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ; LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ; 3. data file decdata contains: 10|.98 20|1234567890.1234 4. Perform join (returns 0 rows instead of 2): SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON T1.D = T2.D ; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call
[ https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768487#comment-13768487 ] Eugene Koifman commented on HIVE-4443: -- Are there any tests that cover new functionality? [HCatalog] Have an option for GET queue to return all job information in single call - Key: HIVE-4443 URL: https://issues.apache.org/jira/browse/HIVE-4443 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-4443-1.patch, HIVE-4443-2.patch, HIVE-4443-3.patch, HIVE-4443-4.patch Currently do display a summary of all jobs, one has to call GET queue to retrieve all the jobids and then call GET queue/:jobid for each job. It would be nice to do this in a single call. I would suggest: * GET queue - mark deprecate * GET queue/jobID - mark deprecate * DELETE queue/jobID - mark deprecate * GET jobs - return the list of JSON objects jobid but no detailed info * GET jobs/fields=* - return the list of JSON objects containing detailed Job info * GET jobs/jobID - return the single JSON object containing the detailed Job info for the job with the given ID (equivalent to GET queue/jobID) * DELETE jobs/jobID - equivalent to DELETE queue/jobID NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768497#comment-13768497 ] Eugene Koifman commented on HIVE-: -- Could the comments be made more detailed? For example, Server#hive() adds 2 params. Could you add a few words about what they are for, or a URL to hive doc that explains where one can get more info? Can the new tests be described in a bit more detail or have a pointer to some place that describes what the tests are testing? This will help others in the future. [HCatalog] WebHCat Hive should support equivalent parameters as Pig Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE--1.patch, HIVE--2.patch, HIVE--3.patch Currently there is no files and args parameter in Hive. We shall add them to make them similar to Pig. NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
[ https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5279: --- Status: Open (was: Patch Available) Following tests failed: * TestParse * TestCliDriver_autogen_colalias.q * TestCliDriver_create_udaf.q * TestCliDriver_create_view.q * TestCliDriver_limit_pushdown.q * TestCliDriver_show_functions.q * TestCliDriver_udaf_sum_list.q * TestCliDriver_udf_percentile.q Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc --- Key: HIVE-5279 URL: https://issues.apache.org/jira/browse/HIVE-5279 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Critical Attachments: 5279.patch, D12963.1.patch We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how previous serialization mechanism solved this but, kryo complaints that it's not Serializable and fails the query. The log below is the example, {noformat} java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector Serialization trace: inputOI (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval) genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc) aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc) conf (org.apache.hadoop.hive.ql.exec.GroupByOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.h {noformat} If this cannot be fixed in somehow, some UDAFs should be modified to be run on hive-0.13.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5278) Move some string UDFs to GenericUDFs, for better varchar support
[ https://issues.apache.org/jira/browse/HIVE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768416#comment-13768416 ] Hudson commented on HIVE-5278: -- FAILURE: Integrated in Hive-trunk-h0.21 #2335 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2335/]) HIVE-5278 : Move some string UDFs to GenericUDFs, for better varchar support (Jason Dere via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1523518) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFConcat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLower.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUpper.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLower.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUpper.java * /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/udf6.q.xml Move some string UDFs to GenericUDFs, for better varchar support Key: HIVE-5278 URL: https://issues.apache.org/jira/browse/HIVE-5278 Project: Hive Issue Type: Improvement Components: Types, UDF Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.13.0 Attachments: D12909.1.patch, HIVE-5278.1.patch, HIVE-5278.2.patch To better support varchar/char types in string UDFs, select UDFs should be converted to GenericUDFs. This allows the UDF to return the resulting char/varchar length in the type metadata. This work is being split off as a separate task from HIVE-4844. The initial UDFs as part of this work are concat/lower/upper. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call
[ https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-4443: - Attachment: (was: HIVE-4443-4.patch) [HCatalog] Have an option for GET queue to return all job information in single call - Key: HIVE-4443 URL: https://issues.apache.org/jira/browse/HIVE-4443 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-4443-1.patch, HIVE-4443-2.patch, HIVE-4443-3.patch, HIVE-4443-4.patch Currently do display a summary of all jobs, one has to call GET queue to retrieve all the jobids and then call GET queue/:jobid for each job. It would be nice to do this in a single call. I would suggest: * GET queue - mark deprecate * GET queue/jobID - mark deprecate * DELETE queue/jobID - mark deprecate * GET jobs - return the list of JSON objects jobid but no detailed info * GET jobs/fields=* - return the list of JSON objects containing detailed Job info * GET jobs/jobID - return the single JSON object containing the detailed Job info for the job with the given ID (equivalent to GET queue/jobID) * DELETE jobs/jobID - equivalent to DELETE queue/jobID NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call
[ https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-4443: - Attachment: HIVE-4443-4.patch [HCatalog] Have an option for GET queue to return all job information in single call - Key: HIVE-4443 URL: https://issues.apache.org/jira/browse/HIVE-4443 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-4443-1.patch, HIVE-4443-2.patch, HIVE-4443-3.patch, HIVE-4443-4.patch Currently do display a summary of all jobs, one has to call GET queue to retrieve all the jobids and then call GET queue/:jobid for each job. It would be nice to do this in a single call. I would suggest: * GET queue - mark deprecate * GET queue/jobID - mark deprecate * DELETE queue/jobID - mark deprecate * GET jobs - return the list of JSON objects jobid but no detailed info * GET jobs/fields=* - return the list of JSON objects containing detailed Job info * GET jobs/jobID - return the single JSON object containing the detailed Job info for the job with the given ID (equivalent to GET queue/jobID) * DELETE jobs/jobID - equivalent to DELETE queue/jobID NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call
[ https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768557#comment-13768557 ] Daniel Dai commented on HIVE-4443: -- Test is in HIVE-5078. [HCatalog] Have an option for GET queue to return all job information in single call - Key: HIVE-4443 URL: https://issues.apache.org/jira/browse/HIVE-4443 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-4443-1.patch, HIVE-4443-2.patch, HIVE-4443-3.patch, HIVE-4443-4.patch Currently do display a summary of all jobs, one has to call GET queue to retrieve all the jobids and then call GET queue/:jobid for each job. It would be nice to do this in a single call. I would suggest: * GET queue - mark deprecate * GET queue/jobID - mark deprecate * DELETE queue/jobID - mark deprecate * GET jobs - return the list of JSON objects jobid but no detailed info * GET jobs/fields=* - return the list of JSON objects containing detailed Job info * GET jobs/jobID - return the single JSON object containing the detailed Job info for the job with the given ID (equivalent to GET queue/jobID) * DELETE jobs/jobID - equivalent to DELETE queue/jobID NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5253) Create component to compile and jar dynamic code
[ https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768530#comment-13768530 ] Edward Capriolo commented on HIVE-5253: --- https://reviews.facebook.net/differential/diff/40041/ Create component to compile and jar dynamic code Key: HIVE-5253 URL: https://issues.apache.org/jira/browse/HIVE-5253 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5253.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4998) support jdbc documented table types in default configuration
[ https://issues.apache.org/jira/browse/HIVE-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-4998: Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) support jdbc documented table types in default configuration Key: HIVE-4998 URL: https://issues.apache.org/jira/browse/HIVE-4998 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.12.0 Attachments: HIVE-4998.1.patch The jdbc table types supported by hive server2 are not the documented typical types [1] in jdbc, they are hive specific types (MANAGED_TABLE, EXTERNAL_TABLE, VIRTUAL_VIEW). HIVE-4573 added support for the jdbc documented typical types, but the HS2 default configuration is to return the hive types The default configuration should result in the expected jdbc typical behavior. [1] http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html?is-external=true#getTables(java.lang.String, java.lang.String, java.lang.String, java.lang.String[]) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5161) Additional SerDe support for varchar type
[ https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768545#comment-13768545 ] Hudson commented on HIVE-5161: -- FAILURE: Integrated in Hive-trunk-hadoop2 #433 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/433/]) HIVE-5161 : Additional SerDe support for varchar type (Jason Dere via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1523532) * /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto * /hive/trunk/ql/src/test/queries/clientpositive/varchar_serde.q * /hive/trunk/ql/src/test/results/clientpositive/varchar_serde.q.out * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java Additional SerDe support for varchar type - Key: HIVE-5161 URL: https://issues.apache.org/jira/browse/HIVE-5161 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Types Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.13.0 Attachments: D12897.1.patch, HIVE-5161.1.patch, HIVE-5161.2.patch, HIVE-5161.3.patch Breaking out support for varchar for the various SerDes as an additional task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.
[ https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-5285: Status: Patch Available (was: Open) RB link updated with new diff file : https://reviews.apache.org/r/14144/ Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors. --- Key: HIVE-5285 URL: https://issues.apache.org/jira/browse/HIVE-5285 Project: Hive Issue Type: Bug Affects Versions: 0.11.1 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is incomplete. Consider a complex nested structure containing the following object inspector hierarchy: SettableStructObjectInspector { ListObjectInspectorNonSettableStructObjectInspector } In the above case, the cast exception can happen via MapOperator/FetchOperator as below: java.io.IOException: java.lang.ClassCastException: com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.lang.ClassCastException: com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529) ... 13 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS
[ https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-5246: -- Attachment: (was: HIVE-5246.1.patch) Local task for map join submitted via oozie job fails on a secure HDFS --- Key: HIVE-5246 URL: https://issues.apache.org/jira/browse/HIVE-5246 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-5246-test.tar For a Hive query started by Oozie Hive action, the local task submitted for Mapjoin fails. The HDFS delegation token is not shared properly with the child JVM created for the local task. Oozie creates a delegation token for the Hive action and sets env variable HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config property. However this doesn't get passed down to the child JVM which causes the problem. This is similar issue addressed by HIVE-4343 which address the problem HiveServer2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4512) The vectorized plan is not picking right expression class for string concatenation.
[ https://issues.apache.org/jira/browse/HIVE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4512: -- Attachment: HIVE-4512.3-vectorization.patch Based patch off the latest vectorization branch The vectorized plan is not picking right expression class for string concatenation. --- Key: HIVE-4512 URL: https://issues.apache.org/jira/browse/HIVE-4512 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Jitendra Nath Pandey Assignee: Eric Hanson Attachments: HIVE-4512.1-vectorization.patch, HIVE-4512.2-vectorization.patch, HIVE-4512.3-vectorization.patch The vectorized plan is not picking right expression class for string concatenation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS
[ https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-5246: -- Attachment: HIVE-5246.1.patch Reattached the patch file Local task for map join submitted via oozie job fails on a secure HDFS --- Key: HIVE-5246 URL: https://issues.apache.org/jira/browse/HIVE-5246 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar For a Hive query started by Oozie Hive action, the local task submitted for Mapjoin fails. The HDFS delegation token is not shared properly with the child JVM created for the local task. Oozie creates a delegation token for the Hive action and sets env variable HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config property. However this doesn't get passed down to the child JVM which causes the problem. This is similar issue addressed by HIVE-4343 which address the problem HiveServer2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768626#comment-13768626 ] Thejas M Nair commented on HIVE-: - Daniel, I got some javadoc errors with new patch. Can you please check ? {code} [javadoc] /home/hortonth/hive_thejas/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Server.java:677: warning - @param argument file is not a parameter name. [javadoc] /home/hortonth/hive_thejas/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Server.java:677: warning - @param argument arg is not a parameter name. [javadoc] /home/hortonth/hive_thejas/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Server.java:677: warning - @param argument files is not a parameter name. [javadoc] /home/hortonth/hive_thejas/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Server.java:677: warning - @param argument define is not a parameter name. {code} [HCatalog] WebHCat Hive should support equivalent parameters as Pig Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE--1.patch, HIVE--2.patch, HIVE--3.patch, HIVE--4.patch Currently there is no files and args parameter in Hive. We shall add them to make them similar to Pig. NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5288) Perflogger should log under single class
[ https://issues.apache.org/jira/browse/HIVE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-5288: --- Attachment: HIVE-5288.01.patch rebase patch Perflogger should log under single class Key: HIVE-5288 URL: https://issues.apache.org/jira/browse/HIVE-5288 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-5288.01.patch, HIVE-5288.patch Perflogger should log under single class, so that it could be turned on without mass logging spew. Right now the log is passed to it externally; this could be preserved by passing in a string to be logged as part of the message. Anyway most of the time it's called from Driver and Utilities, which is a pretty useless class name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4531) [WebHCat] Collecting task logs to hdfs
[ https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768648#comment-13768648 ] Eugene Koifman commented on HIVE-4531: -- Review Comments: Should this include e2e tests in addition (or instead of unit tests). If (when :) Hadoop changes the log file format this will break, but Unit tests won't catch this since the data that the tests parse is static. Here is a bunch of little things/nits: o Server.java has “if (enablelog == true !TempletonUtils.isset(statusdir)) throw new BadParam(enablelog is only applicable when statusdir is set);” in 4 different places. Can this be a method? o What is the purpose of Server#misc()? o TempletonControllerJob: import org.apache.hive.hcatalog.templeton.Main; - unused import oo Line 173 - indentation is off oo Line 295 - writer.close() - This writer is connected to System.err. What are the implications of closing this? What if something tries to write to it later? o TempletonUtils has unused imports - checkstyle needs to be run on the whole patch. o TestJobIDParser mixes JUnit3 and JUnit4. It should either not extend TestCase (I vote for this) or not use @Test annotations o Can JobIDParser (and all subclasses) be made package scoped since they are not used outside templeton pacakge? Similarly, can methods be made as private as possible? o JobIDParser#parseJobID() has “fname” param which is not used. What is the intent? Should it be used in openStatusFile() call? If not, better to remove it. o JobIDParser#openStatusFile() creas a Reader. Where/when is it being closed? o Could the 2 member variables in JobIDParser be made private (even final)? o Why is TestJobIDParser using findJobID() directly? Could it not use parseJobID()? o Can JobIDParser have 1 line of class level javadoc about the purpose of this class? [WebHCat] Collecting task logs to hdfs -- Key: HIVE-4531 URL: https://issues.apache.org/jira/browse/HIVE-4531 Project: Hive Issue Type: New Feature Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, HIVE-4531-8.patch, samplestatusdirwithlist.tar.gz It would be nice we collect task logs after job finish. This is similar to what Amazon EMR does. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
[ https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768692#comment-13768692 ] Ashutosh Chauhan commented on HIVE-5279: Additionally following tests failed too: * TestContribCliDriver_udaf_example_avg.q * TestContribCliDriver_udaf_example_group_concat.q * TestContribCliDriver__udaf_example_max.q * TestContribCliDriver_udaf_example_max_n.q * TestContribCliDriver__udaf_example_min.q * TestContribCliDriver__udaf_example_max_n.q Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc --- Key: HIVE-5279 URL: https://issues.apache.org/jira/browse/HIVE-5279 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Critical Attachments: 5279.patch, D12963.1.patch We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how previous serialization mechanism solved this but, kryo complaints that it's not Serializable and fails the query. The log below is the example, {noformat} java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector Serialization trace: inputOI (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval) genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc) aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc) conf (org.apache.hadoop.hive.ql.exec.GroupByOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.h {noformat} If this cannot be fixed in somehow, some UDAFs should be modified to be run on hive-0.13.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5288) Perflogger should log under single class
[ https://issues.apache.org/jira/browse/HIVE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-5288: --- Attachment: HIVE-5288.01.patch missed one place... Perflogger should log under single class Key: HIVE-5288 URL: https://issues.apache.org/jira/browse/HIVE-5288 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-5288.01.patch, HIVE-5288.01.patch, HIVE-5288.patch Perflogger should log under single class, so that it could be turned on without mass logging spew. Right now the log is passed to it externally; this could be preserved by passing in a string to be logged as part of the message. Anyway most of the time it's called from Driver and Utilities, which is a pretty useless class name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5122) Add partition for multiple partition ignores locations for non-first partitions
[ https://issues.apache.org/jira/browse/HIVE-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768695#comment-13768695 ] Thejas M Nair commented on HIVE-5122: - Navis, as I mention in my previous comment, with masking the test no longer checks if the locations are being picked up correctly. Ie, we won't know if someone introduces a bug that causes same problem again, and causes the first location to be associated with each of the partitions. The test case needs to be changed. Maybe, add partitions with data and use a select query selecting data from the partitions, to verify that correct locations are associated with the partitions? Add partition for multiple partition ignores locations for non-first partitions --- Key: HIVE-5122 URL: https://issues.apache.org/jira/browse/HIVE-5122 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D12411.3.patch, HIVE-5122.D12411.1.patch, HIVE-5122.D12411.2.patch http://www.mail-archive.com/user@hive.apache.org/msg09151.html When multiple partitions are being added in single alter table statement, the location for first partition is being used as the location of all partitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768679#comment-13768679 ] Hive QA commented on HIVE-4961: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603380/HIVE-4961.3-vectorization.patch {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 3954 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump org.apache.hcatalog.listener.TestNotificationListener.testAMQListener org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask org.apache.hive.hcatalog.pig.TestHCatStorer.testPartColsInData org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreInPartiitonedTbl org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreMultiTables org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreWithNoSchema {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/763/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/763/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. Create bridge for custom UDFs to operate in vectorized mode --- Key: HIVE-4961 URL: https://issues.apache.org/jira/browse/HIVE-4961 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4961.1-vectorization.patch, HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch, vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch, vectorUDF.9.patch Suppose you have a custom UDF myUDF() that you've created to extend hive. The goal of this JIRA is to create a facility where if you run a query that uses myUDF() in an expression, the query will run in vectorized mode. This would be a general-purpose bridge for custom UDFs that users add to Hive. It would work with existing UDFs. I'm considering a separate JIRA for a new kind of custom UDF implementation that is vectorized from the beginning, to optimize performance. That is not covered by this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-5284) Some ObjectInspectors do not have a default constructor
[ https://issues.apache.org/jira/browse/HIVE-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved HIVE-5284. Resolution: Won't Fix Agreed that is the correct solution long term. However, today I was not able to find any OI's without a default constructor so I must have been dreaming or looking at an old version of the code. Some ObjectInspectors do not have a default constructor --- Key: HIVE-5284 URL: https://issues.apache.org/jira/browse/HIVE-5284 Project: Hive Issue Type: Bug Reporter: Brock Noland Priority: Minor In HIVE-5263 we started using Kryo to clone the query plan. I thought I added default constructors to all object inspectors but it appears I missed a few. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-5284) Some ObjectInspectors do not have a default constructor
[ https://issues.apache.org/jira/browse/HIVE-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved HIVE-5284. Resolution: Won't Fix Agreed that is the correct solution long term. However, today I was not able to find any OI's without a default constructor so I must have been dreaming or looking at an old version of the code. Some ObjectInspectors do not have a default constructor --- Key: HIVE-5284 URL: https://issues.apache.org/jira/browse/HIVE-5284 Project: Hive Issue Type: Bug Reporter: Brock Noland Priority: Minor In HIVE-5263 we started using Kryo to clone the query plan. I thought I added default constructors to all object inspectors but it appears I missed a few. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4487: --- Status: Patch Available (was: Open) Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria Assignee: Chaoyu Tang Attachments: HIVE-4487.patch The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.
[ https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-5285: Attachment: HIVE-5285.2.patch.txt The previous upload had an optimization fn hasAllFieldsSettable() which breaks a couple of tests because the checks are not precise. I am keeping things simple for now by avoiding any optimizations for creating a new SettableObjectInspector type. This would ensure correctness in all cases, however the future scope would be to prevent creating new SettableObjectInspector object every time we invoke getConvertedOI(). I have tested the changes with partition_fileformat*.q tests and they pass in my local machine. Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors. --- Key: HIVE-5285 URL: https://issues.apache.org/jira/browse/HIVE-5285 Project: Hive Issue Type: Bug Affects Versions: 0.11.1 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is incomplete. Consider a complex nested structure containing the following object inspector hierarchy: SettableStructObjectInspector { ListObjectInspectorNonSettableStructObjectInspector } In the above case, the cast exception can happen via MapOperator/FetchOperator as below: java.io.IOException: java.lang.ClassCastException: com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.lang.ClassCastException: com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529) ... 13 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns
[ https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5297: - Attachment: HIVE-5297.1.patch Hive does not honor type for partition columns -- Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-5297.1.patch Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5288) Perflogger should log under single class
[ https://issues.apache.org/jira/browse/HIVE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-5288: --- Status: Patch Available (was: Open) Perflogger should log under single class Key: HIVE-5288 URL: https://issues.apache.org/jira/browse/HIVE-5288 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-5288.01.patch, HIVE-5288.01.patch, HIVE-5288.patch Perflogger should log under single class, so that it could be turned on without mass logging spew. Right now the log is passed to it externally; this could be preserved by passing in a string to be logged as part of the message. Anyway most of the time it's called from Driver and Utilities, which is a pretty useless class name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns
[ https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5297: - Status: Patch Available (was: Open) Hive does not honor type for partition columns -- Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-5297.1.patch Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 14155: HIVE-5297 Hive does not honor type for partition columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14155/ --- Review request for hive and Ashutosh Chauhan. Bugs: HIVE-5297 https://issues.apache.org/jira/browse/HIVE-5297 Repository: hive-git Description --- Hive does not consider the type of the partition column while writing partitions. Consider for example the query: create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java a704462 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ca667d4 ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 767f545 ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION ql/src/test/results/clientnegative/illegal_partition_type.q.out PRE-CREATION ql/src/test/results/clientnegative/illegal_partition_type2.q.out PRE-CREATION ql/src/test/results/clientpositive/parititon_type_check.q.out PRE-CREATION ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION Diff: https://reviews.apache.org/r/14155/diff/ Testing --- Ran all tests. Thanks, Vikram Dixit Kumaraswamy
[jira] [Commented] (HIVE-5295) HiveConnection#configureConnection tries to execute statement even after it is closed
[ https://issues.apache.org/jira/browse/HIVE-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768730#comment-13768730 ] Thejas M Nair commented on HIVE-5295: - s/Vaibhar/Vaibhav/ HiveConnection#configureConnection tries to execute statement even after it is closed - Key: HIVE-5295 URL: https://issues.apache.org/jira/browse/HIVE-5295 Project: Hive Issue Type: Bug Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: D12957.1.patch HiveConnection#configureConnection tries to execute statement even after it is closed. For remote JDBC client, it tries to set the conf var using 'set foo=bar' by calling HiveStatement.execute for each conf var pair, but closes the statement after the 1st iteration through the conf var pairs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Interesting claims that seem untrue
Ed, If nothing else I'm glad it was interesting enough to generate some discussion. These sorts of stats are always subjects of a lot of controversy. I have seen a lot of these sorts of charts float around in confidential slide decks and I think it's good to have them out in the open where anyone can critique and correct them. In this case Ed, you've pointed out a legitimate flaw in my analysis. Doing the analysis again I found that previously, due to a bug in my scripts, JIRAs that didn't have Hudson comments in them were not counted (this was one way it was identifying SVN commit IDs which I have since removed due to flakiness). Brock's patch was the single largest victim of this bug but not the only one, there were some from Cloudera, NexR, Hortonworks, Facebook even 2 from you Ed. The interested can see a full list of exclusions here: https://docs.google.com/spreadsheet/ccc?key=0ArmXd5zzNQm5dDJTMkFtaUk2d0dyU3hnWGJCcUczbXc#gid=0. I apologize to those under-represented, there wasn't any intent on my part to minimize anyone's work. The impact in final totals is Cloudera +5.4%, NexR +0.8%, Facebook -2.7%, Hortonworks -3.3%. I will be updating the blog later today with relevant corrections. There is going to be continued interest in seeing charts like these, for example when Hive 12 is officially done. Sanjay suggested that LoC counts may not be the best way to represent true contribution. I agree that not all lines of code are created equal, for example a few monster patches recently went in re-arranging HCatalog namespaces and I think also indentation style. This (hopefully) mechanical work is not on the same footing as adding new query language features. Still it is work and wouldn't be fair to pretend it didn't happen. If anyone has ideas on better ways to fairly capture contribution I'm open to suggestions. On Thu, Sep 12, 2013 at 7:19 AM, Edward Capriolo edlinuxg...@gmail.comwrote: I was reading the horton-works blog and found an interesting article. http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753 There is a very interesting graphic which attempts to demonstrate lines of code in the 12 release. http://hortonworks.com/wp-content/uploads/2013/09/hive4.png Although I do not know how they are calculated, they are probably counting code generated by tests output, but besides that they are wrong. One claim is that Cloudera contributed 4,244 lines of code. So to debunk that claim: In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from cloudera, created the ptest2 testing framework. He did all the work for ptest2 in hive 12, and it is clearly more then 4,244 This consists of 84 java files [edward@desksandra ptest2]$ find . -name *.java | wc -l 84 and by itself is 8001 lines of code. [edward@desksandra ptest2]$ find . -name *.java | xargs cat | wc -l 8001 [edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch 7902 HIVE-4675.patch This is not the only feature from cloudera in hive 12. There is also a section of the article that talks of a ROAD MAP for hive features. I did not know we (hive) had a road map. I have advocated switching to feature based release and having a road map before, but it was suggested that might limit people from itch-scratching. -- Carter Shanklin Director, Product Management Hortonworks (M): +1.650.644.8795 (T): @cshanklin http://twitter.com/cshanklin -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HIVE-5297) Hive does not honor type for partition columns
Vikram Dixit K created HIVE-5297: Summary: Hive does not honor type for partition columns Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-: - Attachment: HIVE--5.patch [HCatalog] WebHCat Hive should support equivalent parameters as Pig Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE--1.patch, HIVE--2.patch, HIVE--3.patch, HIVE--4.patch, HIVE--5.patch Currently there is no files and args parameter in Hive. We shall add them to make them similar to Pig. NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768657#comment-13768657 ] Daniel Dai commented on HIVE-: -- Fixed. Sorry about that. [HCatalog] WebHCat Hive should support equivalent parameters as Pig Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE--1.patch, HIVE--2.patch, HIVE--3.patch, HIVE--4.patch, HIVE--5.patch Currently there is no files and args parameter in Hive. We shall add them to make them similar to Pig. NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Hive 0.12 release
I have got requests from multiple people for inclusion some non-blocker jiras in 0.12, and some of them that I got around friday are in the review phase. So I am planning to extending the time for inclusion of any on non-blocker jiras by another 2-3 days. I am hoping to get the patches for any blockers in by middle of next week. On Thu, Aug 29, 2013 at 9:18 PM, Thejas Nair the...@hortonworks.com wrote: It has been more than 3 months since 0.11 was released and we already have 294 jiras in resolved-fixed state for 0.12. This includes several new features such as date data type, optimizer improvements, ORC format improvements and many bug fixes. There are also many features look ready to get committed soon such as the varchar type. I think it is time to start preparing for a 0.12 release by creating a branch later next week and start stabilizing it. What do people think about it ? As we get closer to the branching, we can start discussing any additional features/bug fixes that we should add to the release and start monitoring their progress. Thanks, Thejas -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-4340: - Attachment: HIVE-4340.4.patch.txt HIVE-4340-java-only.4.patch.txt ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768739#comment-13768739 ] Prasanth J commented on HIVE-4340: -- I tried enhancing this patch to support SerDeStats in ORC in a slightly more efficient and less intrusive way. The current implementation of stats gathering happens for each row in processOp() method of FileSinkOperator. For each row, a new SerDeStats object is created and the stats are accumulated in a hashmap. This is good for cases where statistics gathering is not done by underlying storage format. But in case of ORC, ORC already gathers lots of statistics while writing the data which can be leveraged to provide SerDeStats. The statistics gathered by ORC can be retrieved in closeOp() method of FileSinkOperator making it more efficient than row by row processing of serde statistics. Uploaded patch implements the above approach. ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns
[ https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5297: - Description: Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. We should throw an exception on such user error at the time the load happens. was: Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. Hive does not honor type for partition columns -- Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-5297.1.patch Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. We should throw an exception on such user error at the time the load happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 14162: HIVE-4340: ORC should provide raw data size
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14162/ --- Review request for hive, Ashutosh Chauhan and Owen O'Malley. Bugs: HIVE-4340 https://issues.apache.org/jira/browse/HIVE-4340 Repository: hive-git Description --- ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java bcee201 ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 6268617 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 90260fd ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 72e779a ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java 8e74b91 ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java b93db84 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java PRE-CREATION ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 ql/src/test/resources/orc-file-dump.out fac5326 serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java 1c09dc3 Diff: https://reviews.apache.org/r/14162/diff/ Testing --- All unit tests and q file tests related to ORC are passing. Thanks, Prasanth_J
[jira] [Commented] (HIVE-5295) HiveConnection#configureConnection tries to execute statement even after it is closed
[ https://issues.apache.org/jira/browse/HIVE-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768729#comment-13768729 ] Thejas M Nair commented on HIVE-5295: - Vaibhar, can you also please add tests for the fix ? HiveConnection#configureConnection tries to execute statement even after it is closed - Key: HIVE-5295 URL: https://issues.apache.org/jira/browse/HIVE-5295 Project: Hive Issue Type: Bug Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: D12957.1.patch HiveConnection#configureConnection tries to execute statement even after it is closed. For remote JDBC client, it tries to set the conf var using 'set foo=bar' by calling HiveStatement.execute for each conf var pair, but closes the statement after the 1st iteration through the conf var pairs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns
[ https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768798#comment-13768798 ] Ashutosh Chauhan commented on HIVE-5297: Can you create a phabricator entry for this? Hive does not honor type for partition columns -- Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-5297.1.patch Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. We should throw an exception on such user error at the time the load happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768839#comment-13768839 ] Eric Hanson commented on HIVE-4961: --- As far as I can tell, the 11 test failures report in the last test run are not related to this patch. Create bridge for custom UDFs to operate in vectorized mode --- Key: HIVE-4961 URL: https://issues.apache.org/jira/browse/HIVE-4961 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4961.1-vectorization.patch, HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch, vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch, vectorUDF.9.patch Suppose you have a custom UDF myUDF() that you've created to extend hive. The goal of this JIRA is to create a facility where if you run a query that uses myUDF() in an expression, the query will run in vectorized mode. This would be a general-purpose bridge for custom UDFs that users add to Hive. It would work with existing UDFs. I'm considering a separate JIRA for a new kind of custom UDF implementation that is vectorized from the beginning, to optimize performance. That is not covered by this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.
[ https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768844#comment-13768844 ] Hive QA commented on HIVE-5285: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603387/HIVE-5285.2.patch.txt {color:green}SUCCESS:{color} +1 3125 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/765/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/765/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors. --- Key: HIVE-5285 URL: https://issues.apache.org/jira/browse/HIVE-5285 Project: Hive Issue Type: Bug Affects Versions: 0.11.1 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is incomplete. Consider a complex nested structure containing the following object inspector hierarchy: SettableStructObjectInspector { ListObjectInspectorNonSettableStructObjectInspector } In the above case, the cast exception can happen via MapOperator/FetchOperator as below: java.io.IOException: java.lang.ClassCastException: com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.lang.ClassCastException: com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529) ... 13 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833
Xuefu Zhang created HIVE-5298: - Summary: AvroSerde performance problem caused by HIVE-3833 Key: HIVE-5298 URL: https://issues.apache.org/jira/browse/HIVE-5298 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 HIVE-3833 fixed the targeted problem and made Hive to use partition-level metadata to initialize object inspector. In doing that, however, it goes thru every file under the table to access the partition metadata, which is very inefficient, especially in case of multiple files per partition. This causes more problem for AvroSerde because AvroSerde initialization accesses schema, which is located on file system. As a result, before hive can process any data, it needs to access every file for a table, which can take long enough to cause job failure because of lack of job progress. The improvement can be made so that partition metadata is only access once per partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-4340: - Attachment: (was: HIVE-4340-java-only.4.patch.txt) ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 14162: HIVE-4340: ORC should provide raw data size
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14162/ --- (Updated Sept. 16, 2013, 10:10 p.m.) Review request for hive, Ashutosh Chauhan and Owen O'Malley. Changes --- added UNION case to ORC writer raw data size computation. Bugs: HIVE-4340 https://issues.apache.org/jira/browse/HIVE-4340 Repository: hive-git Description --- ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java bcee201 ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 6268617 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 90260fd ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 72e779a ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java 8e74b91 ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java b93db84 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java PRE-CREATION ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 ql/src/test/resources/orc-file-dump.out fac5326 serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java 1c09dc3 Diff: https://reviews.apache.org/r/14162/diff/ Testing --- All unit tests and q file tests related to ORC are passing. Thanks, Prasanth_J
[jira] [Commented] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768848#comment-13768848 ] Prasanth J commented on HIVE-4340: -- added UNION case to ORC writer raw data size computation. ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-4340: - Attachment: HIVE-4340.4.patch.txt HIVE-4340-java-only.4.patch.txt ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-4340: - Attachment: (was: HIVE-4340.4.patch.txt) ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs
[ https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4531: - Status: Open (was: Patch Available) [WebHCat] Collecting task logs to hdfs -- Key: HIVE-4531 URL: https://issues.apache.org/jira/browse/HIVE-4531 Project: Hive Issue Type: New Feature Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, HIVE-4531-8.patch, samplestatusdirwithlist.tar.gz It would be nice we collect task logs after job finish. This is similar to what Amazon EMR does. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4531) [WebHCat] Collecting task logs to hdfs
[ https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768853#comment-13768853 ] Eugene Koifman commented on HIVE-4531: -- WebHCat e2e + HCat unit tests pass [WebHCat] Collecting task logs to hdfs -- Key: HIVE-4531 URL: https://issues.apache.org/jira/browse/HIVE-4531 Project: Hive Issue Type: New Feature Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, HIVE-4531-8.patch, samplestatusdirwithlist.tar.gz It would be nice we collect task logs after job finish. This is similar to what Amazon EMR does. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 14162: HIVE-4340: ORC should provide raw data size
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14162/ --- (Updated Sept. 16, 2013, 10:17 p.m.) Review request for hive, Ashutosh Chauhan and Owen O'Malley. Changes --- The earlier patch didn't apply cleanly. Reuploading a new one. Bugs: HIVE-4340 https://issues.apache.org/jira/browse/HIVE-4340 Repository: hive-git Description --- ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java bcee201 ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 6268617 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 90260fd ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 72e779a ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java 8e74b91 ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java b93db84 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java PRE-CREATION ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 ql/src/test/resources/orc-file-dump.out fac5326 serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java 1c09dc3 Diff: https://reviews.apache.org/r/14162/diff/ Testing --- All unit tests and q file tests related to ORC are passing. Thanks, Prasanth_J
[jira] [Updated] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-4340: - Attachment: (was: HIVE-4340.4.patch.txt) ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-4340: - Attachment: (was: HIVE-4340-java-only.4.patch.txt) ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-4340: - Attachment: HIVE-4340.4.patch.txt HIVE-4340-java-only.4.patch.txt The earlier patch upload didn't apply cleanly. Reuploading a new one. ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5206) Support parameterized primitive types
[ https://issues.apache.org/jira/browse/HIVE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-5206: - Attachment: HIVE-5206.v12.1.patch attaching HIVE-5206.v12.1.patch, for use in 0.12 branch Support parameterized primitive types - Key: HIVE-5206 URL: https://issues.apache.org/jira/browse/HIVE-5206 Project: Hive Issue Type: Improvement Components: Types Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.13.0 Attachments: HIVE-5206.1.patch, HIVE-5206.2.patch, HIVE-5206.3.patch, HIVE-5206.4.patch, HIVE-5206.D12693.1.patch, HIVE-5206.v12.1.patch Support for parameterized types is needed for char/varchar/decimal support. This adds a type parameters value to the PrimitiveTypeEntry/PrimitiveTypeInfo/PrimitiveObjectInspector objects. NO PRECOMMIT TESTS - dependent on HIVE-5203/HIVE-5204 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4844) Add varchar data type
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4844: - Attachment: HIVE-4844.v12.1.patch attaching HIVE-4844.v12.1.patch, for use in 0.12 branch Add varchar data type - Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.13.0 Attachments: HIVE-4844.10.patch, HIVE-4844.11.patch, HIVE-4844.12.patch, HIVE-4844.13.patch, HIVE-4844.14.patch, HIVE-4844.15.patch, HIVE-4844.16.patch, HIVE-4844.17.patch, HIVE-4844.18.patch, HIVE-4844.19.patch, HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, HIVE-4844.D12699.1.patch, HIVE-4844.D12891.1.patch, HIVE-4844.v12.1.patch, screenshot.png Add new varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. Char type will be added as another task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5278) Move some string UDFs to GenericUDFs, for better varchar support
[ https://issues.apache.org/jira/browse/HIVE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-5278: - Attachment: HIVE-5278.v12.1.patch attaching HIVE-5278.v12.1.patch, for use in 0.12 branch Move some string UDFs to GenericUDFs, for better varchar support Key: HIVE-5278 URL: https://issues.apache.org/jira/browse/HIVE-5278 Project: Hive Issue Type: Improvement Components: Types, UDF Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.13.0 Attachments: D12909.1.patch, HIVE-5278.1.patch, HIVE-5278.2.patch, HIVE-5278.v12.1.patch To better support varchar/char types in string UDFs, select UDFs should be converted to GenericUDFs. This allows the UDF to return the resulting char/varchar length in the type metadata. This work is being split off as a separate task from HIVE-4844. The initial UDFs as part of this work are concat/lower/upper. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type
[ https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-5161: - Attachment: HIVE-5161.v12.1.patch attaching HIVE-5161.v12.1.patch, for use in 0.12 branch. Code generated using protobuf-2.4 Additional SerDe support for varchar type - Key: HIVE-5161 URL: https://issues.apache.org/jira/browse/HIVE-5161 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Types Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.13.0 Attachments: D12897.1.patch, HIVE-5161.1.patch, HIVE-5161.2.patch, HIVE-5161.3.patch, HIVE-5161.v12.1.patch Breaking out support for varchar for the various SerDes as an additional task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4340) ORC should provide raw data size
[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768786#comment-13768786 ] Prasanth J commented on HIVE-4340: -- Review board entry https://reviews.apache.org/r/14162 ORC should provide raw data size Key: HIVE-4340 URL: https://issues.apache.org/jira/browse/HIVE-4340 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt ORC's SerDe currently does nothing, and hence does not calculate a raw data size. WriterImpl, however, has enough information to provide one. WriterImpl should compute a raw data size for each row, aggregate them per stripe and record it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator access to the size per row. FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5070) Need to implement listLocatedStatus() in ProxyFileSystem
[ https://issues.apache.org/jira/browse/HIVE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated HIVE-5070: -- Attachment: HIVE-5070-v2.patch V2 of the patch uploaded. To minimize code replication, I created ProxyFileSystemBase class where all the 0.20, 0.20S and 0.23 shims reuse where 0.23 shim override the listLocatedStatus() method. Need to implement listLocatedStatus() in ProxyFileSystem Key: HIVE-5070 URL: https://issues.apache.org/jira/browse/HIVE-5070 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: shanyu zhao Fix For: 0.11.1 Attachments: HIVE-5070.patch.txt, HIVE-5070-v2.patch MAPREDUCE-1981 introduced a new API for FileSystem - listLocatedStatus. It is used in Hadoop's FileInputFormat.getSplits(). Hive's ProxyFileSystem class needs to implement this API in order to make Hive unit test work. Otherwise, you'll see these exceptions when running TestCliDriver test case, e.g. results of running allcolref_in_udf.q: [junit] Running org.apache.hadoop.hive.cli.TestCliDriver [junit] Begin query: allcolref_in_udf.q [junit] java.lang.IllegalArgumentException: Wrong FS: pfile:/GitHub/Monarch/project/hive-monarch/build/ql/test/data/warehouse/src, expected: file:/// [junit] at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) [junit] at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69) [junit] at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375) [junit] at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482) [junit] at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522) [junit] at org.apache.hadoop.fs.FileSystem$4.init(FileSystem.java:1798) [junit] at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1797) [junit] at org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:579) [junit] at org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235) [junit] at org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235) [junit] at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264) [junit] at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217) [junit] at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69) [junit] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:385) [junit] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:351) [junit] at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:389) [junit] at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503) [junit] at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495) [junit] at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390) [junit] at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) [junit] at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481) [junit] at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) [junit] at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) [junit] at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:552) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at javax.security.auth.Subject.doAs(Subject.java:396) [junit] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481) [junit] at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:552) [junit] at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:543) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448) [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:688) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit]
[jira] [Commented] (HIVE-4763) add support for thrift over http transport in HS2
[ https://issues.apache.org/jira/browse/HIVE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768920#comment-13768920 ] Vaibhav Gumashta commented on HIVE-4763: [~cwsteinbach] [~thejas] I've uploaded another wip patch. Now fixing the test suite changes: OOM exception + reorganizing the test classes. add support for thrift over http transport in HS2 - Key: HIVE-4763 URL: https://issues.apache.org/jira/browse/HIVE-4763 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Fix For: 0.12.0 Attachments: HIVE-4763.1.patch, HIVE-4763.2.patch, HIVE-4763.D12855.1.patch Subtask for adding support for http transport mode for thrift api in hive server2. Support for the different authentication modes will be part of another subtask. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.
[ https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768919#comment-13768919 ] Kousuke Saruta commented on HIVE-5296: -- I have some questions. 1. Which server-side (Hiveserver2 process) or client-side, does the memory leak occur? 2. What query did you execute? 3. If you have already grasped, could you tell me which object increase? Memory leak: OOM Error after multiple open/closed JDBC connections. Key: HIVE-5296 URL: https://issues.apache.org/jira/browse/HIVE-5296 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Environment: Hive 0.12.0, Hadoop 1.1.2, Debian. Reporter: Douglas Labels: hiveserver Fix For: 0.12.0 Original Estimate: 168h Remaining Estimate: 168h This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481 However, on inspection of the related patch and my built version of Hive (patch carried forward to 0.12.0), I am still seeing the described behaviour. Multiple connections to Hiveserver2, all of which are closed and disposed of properly show the Java heap size to grow extremely quickly. This issue can be recreated using the following code {code} import java.sql.DriverManager; import java.sql.Connection; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.util.Properties; import org.apache.hive.service.cli.HiveSQLException; import org.apache.log4j.Logger; /* * Class which encapsulates the lifecycle of a query or statement. * Provides functionality which allows you to create a connection */ public class HiveClient { Connection con; Logger logger; private static String driverName = org.apache.hive.jdbc.HiveDriver; private String db; public HiveClient(String db) { logger = Logger.getLogger(HiveClient.class); this.db=db; try{ Class.forName(driverName); }catch(ClassNotFoundException e){ logger.info(Can't find Hive driver); } String hiveHost = GlimmerServer.config.getString(hive/host); String hivePort = GlimmerServer.config.getString(hive/port); String connectionString = jdbc:hive2://+hiveHost+:+hivePort +/default; logger.info(String.format(Attempting to connect to %s,connectionString)); try{ con = DriverManager.getConnection(connectionString,,); }catch(Exception e){ logger.error(Problem instantiating the connection+e.getMessage()); } } public int update(String query) { Integer res = 0; Statement stmt = null; try{ stmt = con.createStatement(); String switchdb = USE +db; logger.info(switchdb); stmt.executeUpdate(switchdb); logger.info(query); res = stmt.executeUpdate(query); logger.info(Query passed to server); stmt.close(); }catch(HiveSQLException e){ logger.info(String.format(HiveSQLException thrown, this can be valid, + but check the error: %s from the query %s,query,e.toString())); }catch(SQLException e){ logger.error(String.format(Unable to execute query SQLException %s. Error: %s,query,e)); }catch(Exception e){ logger.error(String.format(Unable to execute query %s. Error: %s,query,e)); } if(stmt!=null) try{ stmt.close(); }catch(SQLException e){ logger.error(Cannot close the statment, potentially memory leak +e); } return res; } public void close() { if(con!=null){ try { con.close(); } catch (SQLException e) { logger.info(Problem closing connection +e); } }
[jira] [Updated] (HIVE-5267) Use array instead of Collections if possible in DemuxOperator
[ https://issues.apache.org/jira/browse/HIVE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5267: --- Attachment: HIVE-5267.patch Navis, I am uploading a new patch (HIVE-5267.patch) which includes the change I mentioned in phabricator. Use array instead of Collections if possible in DemuxOperator - Key: HIVE-5267 URL: https://issues.apache.org/jira/browse/HIVE-5267 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-5267.D12867.1.patch, HIVE-5267.patch DemuxOperator accesses Maps twice+ for each row, which can be replaced by array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833
[ https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K reassigned HIVE-5298: Assignee: Vikram Dixit K (was: Xuefu Zhang) AvroSerde performance problem caused by HIVE-3833 - Key: HIVE-5298 URL: https://issues.apache.org/jira/browse/HIVE-5298 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Vikram Dixit K Fix For: 0.13.0 HIVE-3833 fixed the targeted problem and made Hive to use partition-level metadata to initialize object inspector. In doing that, however, it goes thru every file under the table to access the partition metadata, which is very inefficient, especially in case of multiple files per partition. This causes more problem for AvroSerde because AvroSerde initialization accesses schema, which is located on file system. As a result, before hive can process any data, it needs to access every file for a table, which can take long enough to cause job failure because of lack of job progress. The improvement can be made so that partition metadata is only access once per partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833
[ https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5298: - Assignee: Xuefu Zhang (was: Vikram Dixit K) AvroSerde performance problem caused by HIVE-3833 - Key: HIVE-5298 URL: https://issues.apache.org/jira/browse/HIVE-5298 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 HIVE-3833 fixed the targeted problem and made Hive to use partition-level metadata to initialize object inspector. In doing that, however, it goes thru every file under the table to access the partition metadata, which is very inefficient, especially in case of multiple files per partition. This causes more problem for AvroSerde because AvroSerde initialization accesses schema, which is located on file system. As a result, before hive can process any data, it needs to access every file for a table, which can take long enough to cause job failure because of lack of job progress. The improvement can be made so that partition metadata is only access once per partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833
[ https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-5298: -- Attachment: HIVE-5298.patch Initial patch. Running tests. Will submit patch if tests pass. AvroSerde performance problem caused by HIVE-3833 - Key: HIVE-5298 URL: https://issues.apache.org/jira/browse/HIVE-5298 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-5298.patch HIVE-3833 fixed the targeted problem and made Hive to use partition-level metadata to initialize object inspector. In doing that, however, it goes thru every file under the table to access the partition metadata, which is very inefficient, especially in case of multiple files per partition. This causes more problem for AvroSerde because AvroSerde initialization accesses schema, which is located on file system. As a result, before hive can process any data, it needs to access every file for a table, which can take long enough to cause job failure because of lack of job progress. The improvement can be made so that partition metadata is only access once per partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833
[ https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768934#comment-13768934 ] Vikram Dixit K commented on HIVE-5298: -- Sorry about the assignment change. Some accidental typing/clicking. I do not know what caused it. Assigned back to Xuefu. AvroSerde performance problem caused by HIVE-3833 - Key: HIVE-5298 URL: https://issues.apache.org/jira/browse/HIVE-5298 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-5298.patch HIVE-3833 fixed the targeted problem and made Hive to use partition-level metadata to initialize object inspector. In doing that, however, it goes thru every file under the table to access the partition metadata, which is very inefficient, especially in case of multiple files per partition. This causes more problem for AvroSerde because AvroSerde initialization accesses schema, which is located on file system. As a result, before hive can process any data, it needs to access every file for a table, which can take long enough to cause job failure because of lack of job progress. The improvement can be made so that partition metadata is only access once per partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 14155: HIVE-5297 Hive does not honor type for partition columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14155/#review26159 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/14155/#comment51110 why false by default? ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/14155/#comment5 nit: the return seems pointless, it always returns the same map that caller already has ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/14155/#comment51112 wouldn't it re-put the entire map into itself as it stands now ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/14155/#comment51113 not really familiar with this flavor of trees; is val guaranteed to be the 2nd argument? Can it be reverse. Also nit above - checks for 0 children but not for 1. ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/14155/#comment51114 nit: use entrySet? ql/src/test/queries/clientnegative/illegal_partition_type.q https://reviews.apache.org/r/14155/#comment51115 local path, also below in other q files - Sergey Shelukhin On Sept. 16, 2013, 9:05 p.m., Vikram Dixit Kumaraswamy wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14155/ --- (Updated Sept. 16, 2013, 9:05 p.m.) Review request for hive and Ashutosh Chauhan. Bugs: HIVE-5297 https://issues.apache.org/jira/browse/HIVE-5297 Repository: hive-git Description --- Hive does not consider the type of the partition column while writing partitions. Consider for example the query: create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java a704462 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ca667d4 ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 767f545 ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION ql/src/test/results/clientnegative/illegal_partition_type.q.out PRE-CREATION ql/src/test/results/clientnegative/illegal_partition_type2.q.out PRE-CREATION ql/src/test/results/clientpositive/parititon_type_check.q.out PRE-CREATION ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION Diff: https://reviews.apache.org/r/14155/diff/ Testing --- Ran all tests. Thanks, Vikram Dixit Kumaraswamy
[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns
[ https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768940#comment-13768940 ] Sergey Shelukhin commented on HIVE-5297: some comments on rb Hive does not honor type for partition columns -- Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-5297.1.patch Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. We should throw an exception on such user error at the time the partition addition/load happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS
[ https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768964#comment-13768964 ] Hive QA commented on HIVE-5246: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603400/HIVE-5246.1.patch {color:green}SUCCESS:{color} +1 3097 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/767/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/767/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Local task for map join submitted via oozie job fails on a secure HDFS --- Key: HIVE-5246 URL: https://issues.apache.org/jira/browse/HIVE-5246 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar For a Hive query started by Oozie Hive action, the local task submitted for Mapjoin fails. The HDFS delegation token is not shared properly with the child JVM created for the local task. Oozie creates a delegation token for the Hive action and sets env variable HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config property. However this doesn't get passed down to the child JVM which causes the problem. This is similar issue addressed by HIVE-4343 which address the problem HiveServer2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768969#comment-13768969 ] Eric Hanson commented on HIVE-4961: --- I ran the failing tests on my machine on a clean version of the vectorization branch without my patch. These tests failed: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump These tests would not run in a way that produced output in ant testreport, and my changes should not affect them. org.apache.hcatalog.listener.TestNotificationListener.testAMQListener org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask org.apache.hive.hcatalog.pig.TestHCatStorer.testPartColsInData org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreInPartiitonedTbl org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreMultiTables org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreWithNoSchema Create bridge for custom UDFs to operate in vectorized mode --- Key: HIVE-4961 URL: https://issues.apache.org/jira/browse/HIVE-4961 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4961.1-vectorization.patch, HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch, vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch, vectorUDF.9.patch Suppose you have a custom UDF myUDF() that you've created to extend hive. The goal of this JIRA is to create a facility where if you run a query that uses myUDF() in an expression, the query will run in vectorized mode. This would be a general-purpose bridge for custom UDFs that users add to Hive. It would work with existing UDFs. I'm considering a separate JIRA for a new kind of custom UDF implementation that is vectorized from the beginning, to optimize performance. That is not covered by this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh
[ https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768977#comment-13768977 ] Eugene Koifman commented on HIVE-5167: -- [~thejas]Couldn't HIVE_HOME logic be simpler? In pseudo code: if(!isset(HIVE_HOME)) { set HIVE_HOME = DEFAULT_HIVE_HOME else { //do nothing; just use this assuming the user set this intentionally } // may optionally check that HIVE_HOME/bin/hive exists webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh --- Key: HIVE-5167 URL: https://issues.apache.org/jira/browse/HIVE-5167 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch HIVE-4820 introduced checks for env variables, but it does so before sourcing webhcat-env.sh. This order needs to be reversed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS
[ https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768979#comment-13768979 ] Brock Noland commented on HIVE-5246: +1 Local task for map join submitted via oozie job fails on a secure HDFS --- Key: HIVE-5246 URL: https://issues.apache.org/jira/browse/HIVE-5246 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar For a Hive query started by Oozie Hive action, the local task submitted for Mapjoin fails. The HDFS delegation token is not shared properly with the child JVM created for the local task. Oozie creates a delegation token for the Hive action and sets env variable HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config property. However this doesn't get passed down to the child JVM which causes the problem. This is similar issue addressed by HIVE-4343 which address the problem HiveServer2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh
[ https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768982#comment-13768982 ] Thejas M Nair commented on HIVE-5167: - Setting HIVE_HOME to DEFAULT_HIVE_HOME, if DEFAULT_HIVE_HOME location is not valid will break hcat scripts. THis is because they will assume the user knows best and try to use the already set HIVE_HOME location. webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh --- Key: HIVE-5167 URL: https://issues.apache.org/jira/browse/HIVE-5167 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch HIVE-4820 introduced checks for env variables, but it does so before sourcing webhcat-env.sh. This order needs to be reversed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh
[ https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768986#comment-13768986 ] Eugene Koifman commented on HIVE-5167: -- OK, so that is what // may optionally check that HIVE_HOME/bin/hive exists would do webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh --- Key: HIVE-5167 URL: https://issues.apache.org/jira/browse/HIVE-5167 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch HIVE-4820 introduced checks for env variables, but it does so before sourcing webhcat-env.sh. This order needs to be reversed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh
[ https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769010#comment-13769010 ] Eugene Koifman commented on HIVE-5167: -- hcat script also sets a default for HIVE_HOME (if not set already) so in webhcat_config.sh we should only set HIVE_HOME if it contains bin/hive webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh --- Key: HIVE-5167 URL: https://issues.apache.org/jira/browse/HIVE-5167 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch HIVE-4820 introduced checks for env variables, but it does so before sourcing webhcat-env.sh. This order needs to be reversed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh
[ https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769027#comment-13769027 ] Eugene Koifman commented on HIVE-5167: -- The existing patch already does what I say in my last comment +1 webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh --- Key: HIVE-5167 URL: https://issues.apache.org/jira/browse/HIVE-5167 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch HIVE-4820 introduced checks for env variables, but it does so before sourcing webhcat-env.sh. This order needs to be reversed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769044#comment-13769044 ] Ashutosh Chauhan commented on HIVE-4961: hcatalog tests are flaky and we can ignore them. But, none of hive tests fail in trunk. Its not likely related to your patch though. I have seen {{input4.q}} and {{plan_json.q}} to fail consistently only on vectorization branch, so they need to be debugged on branch. orc tests I am not sure, but if they fail regardless of patch, I think this patch is good to go. Create bridge for custom UDFs to operate in vectorized mode --- Key: HIVE-4961 URL: https://issues.apache.org/jira/browse/HIVE-4961 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4961.1-vectorization.patch, HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch, vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch, vectorUDF.9.patch Suppose you have a custom UDF myUDF() that you've created to extend hive. The goal of this JIRA is to create a facility where if you run a query that uses myUDF() in an expression, the query will run in vectorized mode. This would be a general-purpose bridge for custom UDFs that users add to Hive. It would work with existing UDFs. I'm considering a separate JIRA for a new kind of custom UDF implementation that is vectorized from the beginning, to optimize performance. That is not covered by this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769049#comment-13769049 ] Mohammad Kamrul Islam commented on HIVE-4732: - [~appodictic]: I can see your point. Indeed a very informative link. As the link mentioned, the probability of ID collisions are very very rare. Pasted from wikipedia: To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion,[38] which means the probability is about 0.006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs. With these probability, will it be necessary to make thing complex. Moreover, these IDs are often few in one hive session. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-4732: Attachment: HIVE-4732.5.patch Fixed the failed testcase. Reduce or eliminate the expensive Schema equals() check for AvroSerde - Key: HIVE-4732 URL: https://issues.apache.org/jira/browse/HIVE-4732 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, HIVE-4732.v1.patch, HIVE-4732.v4.patch The AvroSerde spends a significant amount of time checking schema equality. Changing to compare hashcodes (which can be computed once then reused) will improve performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4512) The vectorized plan is not picking right expression class for string concatenation.
[ https://issues.apache.org/jira/browse/HIVE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769053#comment-13769053 ] Hive QA commented on HIVE-4512: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12603399/HIVE-4512.3-vectorization.patch {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 3951 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump org.apache.hcatalog.api.TestHCatClient.testBasicDDLCommands org.apache.hcatalog.api.TestHCatClient.testPartitionsHCatClientImpl org.apache.hive.hcatalog.api.TestHCatClient.testBasicDDLCommands org.apache.hive.hcatalog.api.TestHCatClient.testDatabaseLocation org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSchema org.apache.hive.hcatalog.api.TestHCatClient.testPartitionsHCatClientImpl org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask org.apache.hive.hcatalog.mapreduce.TestHCatExternalDynamicPartitioned.testHCatDynamicPartitionedTable org.apache.hive.hcatalog.mapreduce.TestHCatExternalDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask org.apache.hive.hcatalog.mapreduce.TestHCatExternalPartitioned.testHCatPartitionedTable org.apache.hive.hcatalog.pig.TestHCatLoader.testGetInputBytes org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/768/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/768/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. The vectorized plan is not picking right expression class for string concatenation. --- Key: HIVE-4512 URL: https://issues.apache.org/jira/browse/HIVE-4512 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Jitendra Nath Pandey Assignee: Eric Hanson Attachments: HIVE-4512.1-vectorization.patch, HIVE-4512.2-vectorization.patch, HIVE-4512.3-vectorization.patch The vectorized plan is not picking right expression class for string concatenation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
[ https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769056#comment-13769056 ] Navis commented on HIVE-5279: - Oh, I'll check that. Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc --- Key: HIVE-5279 URL: https://issues.apache.org/jira/browse/HIVE-5279 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Critical Attachments: 5279.patch, D12963.1.patch We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how previous serialization mechanism solved this but, kryo complaints that it's not Serializable and fails the query. The log below is the example, {noformat} java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector Serialization trace: inputOI (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval) genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc) aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc) conf (org.apache.hadoop.hive.ql.exec.GroupByOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.h {noformat} If this cannot be fixed in somehow, some UDAFs should be modified to be run on hive-0.13.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns
[ https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5297: - Attachment: HIVE-5297.2.patch Hive does not honor type for partition columns -- Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. We should throw an exception on such user error at the time the partition addition/load happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 14155: HIVE-5297 Hive does not honor type for partition columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14155/ --- (Updated Sept. 17, 2013, 1:14 a.m.) Review request for hive and Ashutosh Chauhan. Changes --- Addressed Sergey's comments. Bugs: HIVE-5297 https://issues.apache.org/jira/browse/HIVE-5297 Repository: hive-git Description --- Hive does not consider the type of the partition column while writing partitions. Consider for example the query: create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java a704462 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java fb79823 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ca667d4 ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 767f545 ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION ql/src/test/results/clientnegative/illegal_partition_type.q.out PRE-CREATION ql/src/test/results/clientnegative/illegal_partition_type2.q.out PRE-CREATION ql/src/test/results/clientpositive/parititon_type_check.q.out PRE-CREATION ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION Diff: https://reviews.apache.org/r/14155/diff/ Testing --- Ran all tests. Thanks, Vikram Dixit Kumaraswamy
[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns
[ https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769059#comment-13769059 ] Vikram Dixit K commented on HIVE-5297: -- Second iteration. Hive does not honor type for partition columns -- Key: HIVE-5297 URL: https://issues.apache.org/jira/browse/HIVE-5297 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch Hive does not consider the type of the partition column while writing partitions. Consider for example the query: {noformat} create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) row format delimited fields terminated by ','; alter table tab1 add partition (month='June', day='second'); {noformat} Hive accepts this query. However if you try to select from this table and insert into another expecting schema match, it will insert nulls instead. We should throw an exception on such user error at the time the partition addition/load happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5299) allow exposing metastore APIs from HiveServer2 with embedded metastore
Sergey Shelukhin created HIVE-5299: -- Summary: allow exposing metastore APIs from HiveServer2 with embedded metastore Key: HIVE-5299 URL: https://issues.apache.org/jira/browse/HIVE-5299 Project: Hive Issue Type: New Feature Components: HiveServer2, Metastore Reporter: Sergey Shelukhin There are (at least) two reasons to run metastore as a standalone service, rather than embedding it - access by non-hive clients, as well as miscellaneous advantages of a central service such as DB connection caching, etc. If HiveServer2 is used, as far as I see, the latter do not require a standalone metastore (there's already a central service into which metastore could be embedded). However, the former still does. We should consider exposing metastore APIs from HiveServer2 (configurable on/off) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5267) Use array instead of Collections if possible in DemuxOperator
[ https://issues.apache.org/jira/browse/HIVE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769065#comment-13769065 ] Yin Huai commented on HIVE-5267: +1. Will commit it if all tests pass. Use array instead of Collections if possible in DemuxOperator - Key: HIVE-5267 URL: https://issues.apache.org/jira/browse/HIVE-5267 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-5267.D12867.1.patch, HIVE-5267.patch DemuxOperator accesses Maps twice+ for each row, which can be replaced by array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira