[jira] [Updated] (HIVE-4222) Timestamp type constants cannot be deserialized in JDK 1.6 or less
[ https://issues.apache.org/jira/browse/HIVE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4222: - Attachment: HIVE-4222.D9681.2.patch Timestamp type constants cannot be deserialized in JDK 1.6 or less -- Key: HIVE-4222 URL: https://issues.apache.org/jira/browse/HIVE-4222 Project: Hive Issue Type: Bug Components: Types Reporter: Navis Assignee: Navis Attachments: HIVE-4222.D9681.1.patch, HIVE-4222.D9681.2.patch For example, {noformat} ExprNodeConstantDesc constant = new ExprNodeConstantDesc(TypeInfoFactory.timestampTypeInfo, new Timestamp(100)); String serialized = Utilities.serializeExpression(constant); ExprNodeConstantDesc deserilized = (ExprNodeConstantDesc) Utilities.deserializeExpression(serialized, new Configuration()); {noformat} logs error message {noformat} java.lang.InstantiationException: java.sql.Timestamp Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... {noformat} and makes NPE in final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4222) Timestamp type constants cannot be deserialized in JDK 1.6 or less
[ https://issues.apache.org/jira/browse/HIVE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716137#comment-13716137 ] Jason Dere commented on HIVE-4222: -- Updated patch with home-rolled PersistenceDelegate Timestamp type constants cannot be deserialized in JDK 1.6 or less -- Key: HIVE-4222 URL: https://issues.apache.org/jira/browse/HIVE-4222 Project: Hive Issue Type: Bug Components: Types Reporter: Navis Assignee: Navis Attachments: HIVE-4222.D9681.1.patch, HIVE-4222.D9681.2.patch For example, {noformat} ExprNodeConstantDesc constant = new ExprNodeConstantDesc(TypeInfoFactory.timestampTypeInfo, new Timestamp(100)); String serialized = Utilities.serializeExpression(constant); ExprNodeConstantDesc deserilized = (ExprNodeConstantDesc) Utilities.deserializeExpression(serialized, new Configuration()); {noformat} logs error message {noformat} java.lang.InstantiationException: java.sql.Timestamp Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... {noformat} and makes NPE in final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4911) Enable QOP configuration for Hive Server 2 thrift transport
[ https://issues.apache.org/jira/browse/HIVE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716145#comment-13716145 ] Chris Drome commented on HIVE-4911: --- [~brocknoland], I marked this patch as superceding HIVE-4225. HIVE-4225 only addresses the fact that HS2 was ignoring the hadoop.rpc.protection setting. The major limitation of HIVE-4225 is that it applies the QOP setting to both external and internal connections. HIVE-4911 improves upon this by allowing separate configuration of external and internal connections. An example of where this is important is when the HS2 client connection must be encrypted, but the connection between HS2 and JT/NN does not require encryption. Enable QOP configuration for Hive Server 2 thrift transport --- Key: HIVE-4911 URL: https://issues.apache.org/jira/browse/HIVE-4911 Project: Hive Issue Type: New Feature Reporter: Arup Malakar Assignee: Arup Malakar Attachments: HIVE-4911-trunk-0.patch The QoP for hive server 2 should be configurable to enable encryption. A new configuration should be exposed hive.server2.thrift.rpc.protection. This would give greater control configuring hive server 2 service. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2905) Desc table can't show non-ascii comments
[ https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716174#comment-13716174 ] Navis commented on HIVE-2905: - added to FAQ(https://cwiki.apache.org/confluence/display/Hive/User+FAQ#UserFAQ-DoesHivesupportUnicode%3F) Desc table can't show non-ascii comments Key: HIVE-2905 URL: https://issues.apache.org/jira/browse/HIVE-2905 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.7.0, 0.10.0 Environment: hive 0.7.0, mysql 5.1.45 hive 0.10.0, mysql 5.5.30 Reporter: Sheng Zhou Labels: patch Attachments: HIVE-2905.D11487.1.patch, utf8-desc-comment.patch When desc a table with command line or hive jdbc way, the table's comment can't be read. 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml file. jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8 2. In mysql database, the comment field of COLUMNS table can be read normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4864) Code Comments seems confused between GenericUDFCase GenericUDFWhen
[ https://issues.apache.org/jira/browse/HIVE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated HIVE-4864: Affects Version/s: 0.9.0 Status: Patch Available (was: Open) Code Comments seems confused between GenericUDFCase GenericUDFWhen Key: HIVE-4864 URL: https://issues.apache.org/jira/browse/HIVE-4864 Project: Hive Issue Type: Task Affects Versions: 0.9.0 Reporter: Cheng Hao Priority: Trivial Attachments: 1.patch Code Comment in GenericUDFCase: /** * GenericUDF Class for SQL construct * CASE WHEN a THEN b WHEN c THEN d [ELSE f] END. * * NOTES: 1. a and c should be boolean, or an exception will be thrown. 2. b, d * and f should have the same TypeInfo, or an exception will be thrown. */ And the code comment in GenericUDFWhen: /** * GenericUDF Class for SQL construct CASE a WHEN b THEN c [ELSE f] END. * * NOTES: 1. a and b should have the same TypeInfo, or an exception will be thrown. 2. c and f should have the same TypeInfo, or an exception will be * thrown. */ From the code itself, seems the comments should be exchanged. We'd better amend that to avoid confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4864) Code Comments seems confused between GenericUDFCase GenericUDFWhen
[ https://issues.apache.org/jira/browse/HIVE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated HIVE-4864: Attachment: 1.patch Please check the attachment. Code Comments seems confused between GenericUDFCase GenericUDFWhen Key: HIVE-4864 URL: https://issues.apache.org/jira/browse/HIVE-4864 Project: Hive Issue Type: Task Reporter: Cheng Hao Priority: Trivial Attachments: 1.patch Code Comment in GenericUDFCase: /** * GenericUDF Class for SQL construct * CASE WHEN a THEN b WHEN c THEN d [ELSE f] END. * * NOTES: 1. a and c should be boolean, or an exception will be thrown. 2. b, d * and f should have the same TypeInfo, or an exception will be thrown. */ And the code comment in GenericUDFWhen: /** * GenericUDF Class for SQL construct CASE a WHEN b THEN c [ELSE f] END. * * NOTES: 1. a and b should have the same TypeInfo, or an exception will be thrown. 2. c and f should have the same TypeInfo, or an exception will be * thrown. */ From the code itself, seems the comments should be exchanged. We'd better amend that to avoid confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4825: - Attachment: HIVE-4825.4.patch .4 is rebased to trunk Separate MapredWork into MapWork and ReduceWork --- Key: HIVE-4825 URL: https://issues.apache.org/jira/browse/HIVE-4825 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch Right now all the information needed to run an MR job is captured in MapredWork. This class has aliases, tagging info, table descriptors etc. For Tez and MRR it will be useful to break this into map and reduce specific pieces. The separation is natural and I think has value in itself, it makes the code easier to understand. However, it will also allow us to reuse these abstractions in Tez where you'll have a graph of these instead of just 1M and 0-1R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4916) Add TezWork
Gunther Hagleitner created HIVE-4916: Summary: Add TezWork Key: HIVE-4916 URL: https://issues.apache.org/jira/browse/HIVE-4916 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner TezWork is the class that encapsulates all the info needed to execute a single Tez job (i.e.: a dag of map or reduce work). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4916) Add TezWork
[ https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4916: - Fix Version/s: tez-branch Add TezWork --- Key: HIVE-4916 URL: https://issues.apache.org/jira/browse/HIVE-4916 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch TezWork is the class that encapsulates all the info needed to execute a single Tez job (i.e.: a dag of map or reduce work). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4916) Add TezWork
[ https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4916: - Attachment: HIVE-4916.1.patch.branch Add TezWork --- Key: HIVE-4916 URL: https://issues.apache.org/jira/browse/HIVE-4916 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4916.1.patch.branch TezWork is the class that encapsulates all the info needed to execute a single Tez job (i.e.: a dag of map or reduce work). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4917) Tez Job Monitoring
Gunther Hagleitner created HIVE-4917: Summary: Tez Job Monitoring Key: HIVE-4917 URL: https://issues.apache.org/jira/browse/HIVE-4917 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch TezJobMonitor handles monitoring the execution of a Tez dag -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4917) Tez Job Monitoring
[ https://issues.apache.org/jira/browse/HIVE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4917: - Attachment: HIVE-4917.1.patch.branch Tez Job Monitoring -- Key: HIVE-4917 URL: https://issues.apache.org/jira/browse/HIVE-4917 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4917.1.patch.branch TezJobMonitor handles monitoring the execution of a Tez dag -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4918) Tez job submission
Gunther Hagleitner created HIVE-4918: Summary: Tez job submission Key: HIVE-4918 URL: https://issues.apache.org/jira/browse/HIVE-4918 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch This patch is to create infrastructure to submit a tez dag. (i.e.: TezTask + utils to convert work into a tez dag). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4918) Tez job submission
[ https://issues.apache.org/jira/browse/HIVE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4918: - Attachment: HIVE-4918.1.patch.branch Tez job submission -- Key: HIVE-4918 URL: https://issues.apache.org/jira/browse/HIVE-4918 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4918.1.patch.branch This patch is to create infrastructure to submit a tez dag. (i.e.: TezTask + utils to convert work into a tez dag). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4660) Let there be Tez
[ https://issues.apache.org/jira/browse/HIVE-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4660: - Description: Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the design doc here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez was: Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the design doc here: Let there be Tez Key: HIVE-4660 URL: https://issues.apache.org/jira/browse/HIVE-4660 Project: Hive Issue Type: New Feature Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the design doc here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4660) Let there be Tez
[ https://issues.apache.org/jira/browse/HIVE-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4660: - Description: Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the design doc here: was: Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the attached design doc. Let there be Tez Key: HIVE-4660 URL: https://issues.apache.org/jira/browse/HIVE-4660 Project: Hive Issue Type: New Feature Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the design doc here: -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/ --- (Updated July 23, 2013, 9:51 a.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Changes --- Updated with Jacob feedbacks. Bugs: HIVE-3159 https://issues.apache.org/jira/browse/HIVE-3159 Repository: hive-git Description --- Problem: Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name. For multiple cases, it is very inconvenient for user. Some of the un-supported use cases: 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE 3. Create table without specifying Avro schema. Diffs (updated) - ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/11925/diff/ Testing --- Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases. Thanks, Mohammad Islam
[jira] [Commented] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716282#comment-13716282 ] Hive QA commented on HIVE-4825: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12593669/HIVE-4825.4.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/149/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/149/console Messages: {noformat} Executing org.apache.hive.ptest.execution.CleanupPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests failed with: IllegalStateException: Too many bad hosts: 1.0% (10 / 10) is greater than threshold of 50% {noformat} This message is automatically generated. Separate MapredWork into MapWork and ReduceWork --- Key: HIVE-4825 URL: https://issues.apache.org/jira/browse/HIVE-4825 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch Right now all the information needed to run an MR job is captured in MapredWork. This class has aliases, tagging info, table descriptors etc. For Tez and MRR it will be useful to break this into map and reduce specific pieces. The separation is natural and I think has value in itself, it makes the code easier to understand. However, it will also allow us to reuse these abstractions in Tez where you'll have a graph of these instead of just 1M and 0-1R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716430#comment-13716430 ] Brock Noland commented on HIVE-4825: There was a large price spike in spot instances overnight. I kicked this off again. Also that error message needs to be cleaned up. Separate MapredWork into MapWork and ReduceWork --- Key: HIVE-4825 URL: https://issues.apache.org/jira/browse/HIVE-4825 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch Right now all the information needed to run an MR job is captured in MapredWork. This class has aliases, tagging info, table descriptors etc. For Tez and MRR it will be useful to break this into map and reduce specific pieces. The separation is natural and I think has value in itself, it makes the code easier to understand. However, it will also allow us to reuse these abstractions in Tez where you'll have a graph of these instead of just 1M and 0-1R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4907) Allow additional tests cases to be specified with -Dtestcase
[ https://issues.apache.org/jira/browse/HIVE-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716438#comment-13716438 ] Brock Noland commented on HIVE-4907: Yes, this is exactly what I was looking for. Allow additional tests cases to be specified with -Dtestcase Key: HIVE-4907 URL: https://issues.apache.org/jira/browse/HIVE-4907 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4907.patch Currently we only allow a single tests case to be specified with -Dtestcase. It'd be ideal if we could add on additional test cases as this would allow us to batch the unit tests in ptest2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4907) Allow additional tests cases to be specified with -Dtestcase
[ https://issues.apache.org/jira/browse/HIVE-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4907: --- Status: Open (was: Patch Available) Allow additional tests cases to be specified with -Dtestcase Key: HIVE-4907 URL: https://issues.apache.org/jira/browse/HIVE-4907 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4907.patch Currently we only allow a single tests case to be specified with -Dtestcase. It'd be ideal if we could add on additional test cases as this would allow us to batch the unit tests in ptest2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4864) Code Comments seems confused between GenericUDFCase GenericUDFWhen
[ https://issues.apache.org/jira/browse/HIVE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716449#comment-13716449 ] Ashutosh Chauhan commented on HIVE-4864: [~chenghao] Can you put these comment in Description annotation (see other udfs for example) that way it will show up to users when they do describe on these udfs. Code Comments seems confused between GenericUDFCase GenericUDFWhen Key: HIVE-4864 URL: https://issues.apache.org/jira/browse/HIVE-4864 Project: Hive Issue Type: Task Affects Versions: 0.9.0 Reporter: Cheng Hao Priority: Trivial Attachments: 1.patch Code Comment in GenericUDFCase: /** * GenericUDF Class for SQL construct * CASE WHEN a THEN b WHEN c THEN d [ELSE f] END. * * NOTES: 1. a and c should be boolean, or an exception will be thrown. 2. b, d * and f should have the same TypeInfo, or an exception will be thrown. */ And the code comment in GenericUDFWhen: /** * GenericUDF Class for SQL construct CASE a WHEN b THEN c [ELSE f] END. * * NOTES: 1. a and b should have the same TypeInfo, or an exception will be thrown. 2. c and f should have the same TypeInfo, or an exception will be * thrown. */ From the code itself, seems the comments should be exchanged. We'd better amend that to avoid confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4864) Code Comments seems confused between GenericUDFCase GenericUDFWhen
[ https://issues.apache.org/jira/browse/HIVE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4864: --- Assignee: Cheng Hao Affects Version/s: 0.10.0 0.11.0 Status: Open (was: Patch Available) Code Comments seems confused between GenericUDFCase GenericUDFWhen Key: HIVE-4864 URL: https://issues.apache.org/jira/browse/HIVE-4864 Project: Hive Issue Type: Task Affects Versions: 0.11.0, 0.10.0, 0.9.0 Reporter: Cheng Hao Assignee: Cheng Hao Priority: Trivial Attachments: 1.patch Code Comment in GenericUDFCase: /** * GenericUDF Class for SQL construct * CASE WHEN a THEN b WHEN c THEN d [ELSE f] END. * * NOTES: 1. a and c should be boolean, or an exception will be thrown. 2. b, d * and f should have the same TypeInfo, or an exception will be thrown. */ And the code comment in GenericUDFWhen: /** * GenericUDF Class for SQL construct CASE a WHEN b THEN c [ELSE f] END. * * NOTES: 1. a and b should have the same TypeInfo, or an exception will be thrown. 2. c and f should have the same TypeInfo, or an exception will be * thrown. */ From the code itself, seems the comments should be exchanged. We'd better amend that to avoid confusing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Does HiveServer2 support delegation token?
Hi, all HiveMetastore supports delegation token. Does HiveServer2 support it as well? If not, do we have a plan for this? Besides, on hive wiki hive.server2.authentication - Authentication mode, default NONE. Options are NONE, KERBEROS, LDAP and CUSTOM Will HiveServer2 support PAM which could be configured to use multiple authentication ways like OS, or LDAP as well? Thanks, - Bing
[jira] [Commented] (HIVE-4670) Authentication module should pass the instance part of the Kerberos principle
[ https://issues.apache.org/jira/browse/HIVE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716468#comment-13716468 ] Ashutosh Chauhan commented on HIVE-4670: Primary usecase for remote user variable is for audit logging. Isn't it useful to have realm in there as well ? Authentication module should pass the instance part of the Kerberos principle - Key: HIVE-4670 URL: https://issues.apache.org/jira/browse/HIVE-4670 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2 Affects Versions: 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4670.2.patch, HIVE-4670.3.patch When Kerberos authentication is enabled for HiveServer2, the thrift SASL layer passes instance@realm from the principal. It should instead strip the realm and pass just the instance part of the principal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4225) HiveServer2 does not support SASL QOP
[ https://issues.apache.org/jira/browse/HIVE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716482#comment-13716482 ] Joey Echeverria commented on HIVE-4225: --- I haven't had a chance to review HIVE-4911 but so long as it works for both HS2 and the Metastore Server, I'm good with having an independent configuration. HiveServer2 does not support SASL QOP - Key: HIVE-4225 URL: https://issues.apache.org/jira/browse/HIVE-4225 Project: Hive Issue Type: Bug Components: HiveServer2, Shims Affects Versions: 0.11.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-4225-1.patch, HIVE-4225.D10959.1.patch, HIVE-4225.patch HiveServer2 implements Kerberos authentication through SASL framework, but does not support setting QOP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4871) Apache builds fail with Target make-pom does not exist in the project hcatalog.
[ https://issues.apache.org/jira/browse/HIVE-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716488#comment-13716488 ] Ashutosh Chauhan commented on HIVE-4871: This is still publishing hcatalog artifacts in separate namespace? Apache builds fail with Target make-pom does not exist in the project hcatalog. --- Key: HIVE-4871 URL: https://issues.apache.org/jira/browse/HIVE-4871 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4871.patch Original Estimate: 168h Remaining Estimate: 168h For example, https://builds.apache.org/job/Hive-trunk-h0.21/2192/console. All unit tests pass, but deployment of build artifacts fails. HIVE-4387 provided a bandaid for 0.11. Need to figure out long term fix for this for 0.12. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4871) Apache builds fail with Target make-pom does not exist in the project hcatalog.
[ https://issues.apache.org/jira/browse/HIVE-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716508#comment-13716508 ] Eugene Koifman commented on HIVE-4871: -- yes. I'll change the maven coordinates when I change the package name (later this week) Apache builds fail with Target make-pom does not exist in the project hcatalog. --- Key: HIVE-4871 URL: https://issues.apache.org/jira/browse/HIVE-4871 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4871.patch Original Estimate: 168h Remaining Estimate: 168h For example, https://builds.apache.org/job/Hive-trunk-h0.21/2192/console. All unit tests pass, but deployment of build artifacts fails. HIVE-4387 provided a bandaid for 0.11. Need to figure out long term fix for this for 0.12. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4547) A complex create view statement fails with new Antlr 3.4
[ https://issues.apache.org/jira/browse/HIVE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716546#comment-13716546 ] Ashutosh Chauhan commented on HIVE-4547: +1 A complex create view statement fails with new Antlr 3.4 Key: HIVE-4547 URL: https://issues.apache.org/jira/browse/HIVE-4547 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.12.0 Attachments: HIVE-4547-1.patch, HIVE-4547-repro.tar A complex create view statement with CAST in join condition fails with IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade (HIVE-2439). The same statement works fine with Hive 0.9 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4892) PTest2 cleanup after merge
[ https://issues.apache.org/jira/browse/HIVE-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716550#comment-13716550 ] Ashutosh Chauhan commented on HIVE-4892: There are few new files in there which looks like test logs. Are those needed ? PTest2 cleanup after merge -- Key: HIVE-4892 URL: https://issues.apache.org/jira/browse/HIVE-4892 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4892.patch, HIVE-4892.patch HIVE-4675 was merged but there are still a few minor issues we need to cleanup: * README is out of date * Need to limit the number of failed source directories we copy back from the slaves * when looking for TEST-*.xml files we look at both the log directory (good) and the failed source directories (bad) therefore duplicating failures in jenkins report * We need to process bad hosts in the finally block of PTest.run (HIVE-4882) * Need a mechanism to clean the ivy and maven cache (HIVE-4882) * PTest2 fails to publish a comment to a JIRA sometimes (HIVE-4889) * Now that PTest2 is committed to the source tree it's copying in our TEST-SomeTest*.xml files Test Properties: NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4611) SMB joins fail based on bigtable selection policy.
[ https://issues.apache.org/jira/browse/HIVE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716555#comment-13716555 ] Ashutosh Chauhan commented on HIVE-4611: [~vikram.dixit] Can you create phabricator or RB entry for this? SMB joins fail based on bigtable selection policy. -- Key: HIVE-4611 URL: https://issues.apache.org/jira/browse/HIVE-4611 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.11.1 Attachments: HIVE-4611.2.patch, HIVE-4611.3.patch, HIVE-4611.4.patch, HIVE-4611.patch The default setting for hive.auto.convert.sortmerge.join.bigtable.selection.policy will choose the big table as the one with largest average partition size. However, this can result in a query failing because this policy conflicts with the big table candidates chosen for outer joins. This policy should just be a tie breaker and not have the ultimate say in the choice of tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4892) PTest2 cleanup after merge
[ https://issues.apache.org/jira/browse/HIVE-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716564#comment-13716564 ] Brock Noland commented on HIVE-4892: Hi, PTest2 parses the TEST-\*.xml logs and the patch does include sample TEST outputs for uniting testing purposes. Unfortunately in the current version they are named TEST..xml in the source tree which is causing ptest2 issues when it finds these outputs. This patch renames them to remove the TEST prefix. There is also a few very small other ouptuts such as hive.log for the same purpose. These already exist in the source tree but are being renamed. When we get this committed I can submit a performance improvement patch that should increase throughput of the pre-commit tests by 2x. PTest2 cleanup after merge -- Key: HIVE-4892 URL: https://issues.apache.org/jira/browse/HIVE-4892 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4892.patch, HIVE-4892.patch HIVE-4675 was merged but there are still a few minor issues we need to cleanup: * README is out of date * Need to limit the number of failed source directories we copy back from the slaves * when looking for TEST-*.xml files we look at both the log directory (good) and the failed source directories (bad) therefore duplicating failures in jenkins report * We need to process bad hosts in the finally block of PTest.run (HIVE-4882) * Need a mechanism to clean the ivy and maven cache (HIVE-4882) * PTest2 fails to publish a comment to a JIRA sometimes (HIVE-4889) * Now that PTest2 is committed to the source tree it's copying in our TEST-SomeTest*.xml files Test Properties: NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4611) SMB joins fail based on bigtable selection policy.
[ https://issues.apache.org/jira/browse/HIVE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716569#comment-13716569 ] Vikram Dixit K commented on HIVE-4611: -- Review board request: https://reviews.apache.org/r/12827/ SMB joins fail based on bigtable selection policy. -- Key: HIVE-4611 URL: https://issues.apache.org/jira/browse/HIVE-4611 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.11.1 Attachments: HIVE-4611.2.patch, HIVE-4611.3.patch, HIVE-4611.4.patch, HIVE-4611.patch The default setting for hive.auto.convert.sortmerge.join.bigtable.selection.policy will choose the big table as the one with largest average partition size. However, this can result in a query failing because this policy conflicts with the big table candidates chosen for outer joins. This policy should just be a tie breaker and not have the ultimate say in the choice of tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716573#comment-13716573 ] Eric Hanson commented on HIVE-4642: --- Hi Teddy, how's it going with this? When do you think you can finish this up? Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4911) Enable QOP configuration for Hive Server 2 thrift transport
[ https://issues.apache.org/jira/browse/HIVE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716612#comment-13716612 ] Brock Noland commented on HIVE-4911: Arup, Does this work for both [HS2 and HMS|https://issues.apache.org/jira/browse/HIVE-4225?focusedCommentId=13716482page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13716482]? Also, in regards to SaslQOP, is there a reason you don't use valueOf() as opposed to implementing fromString()? Enable QOP configuration for Hive Server 2 thrift transport --- Key: HIVE-4911 URL: https://issues.apache.org/jira/browse/HIVE-4911 Project: Hive Issue Type: New Feature Reporter: Arup Malakar Assignee: Arup Malakar Attachments: HIVE-4911-trunk-0.patch The QoP for hive server 2 should be configurable to enable encryption. A new configuration should be exposed hive.server2.thrift.rpc.protection. This would give greater control configuring hive server 2 service. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4892) PTest2 cleanup after merge
[ https://issues.apache.org/jira/browse/HIVE-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716629#comment-13716629 ] Ashutosh Chauhan commented on HIVE-4892: +1 PTest2 cleanup after merge -- Key: HIVE-4892 URL: https://issues.apache.org/jira/browse/HIVE-4892 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4892.patch, HIVE-4892.patch HIVE-4675 was merged but there are still a few minor issues we need to cleanup: * README is out of date * Need to limit the number of failed source directories we copy back from the slaves * when looking for TEST-*.xml files we look at both the log directory (good) and the failed source directories (bad) therefore duplicating failures in jenkins report * We need to process bad hosts in the finally block of PTest.run (HIVE-4882) * Need a mechanism to clean the ivy and maven cache (HIVE-4882) * PTest2 fails to publish a comment to a JIRA sometimes (HIVE-4889) * Now that PTest2 is committed to the source tree it's copying in our TEST-SomeTest*.xml files Test Properties: NO PRECOMMIT TESTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4222) Timestamp type constants cannot be deserialized in JDK 1.6 or less
[ https://issues.apache.org/jira/browse/HIVE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4222: - Attachment: HIVE-4222.D9681.3.patch Update patch to remove date test, since Date type does not yet exist on trunk. Timestamp type constants cannot be deserialized in JDK 1.6 or less -- Key: HIVE-4222 URL: https://issues.apache.org/jira/browse/HIVE-4222 Project: Hive Issue Type: Bug Components: Types Reporter: Navis Assignee: Navis Attachments: HIVE-4222.D9681.1.patch, HIVE-4222.D9681.2.patch, HIVE-4222.D9681.3.patch For example, {noformat} ExprNodeConstantDesc constant = new ExprNodeConstantDesc(TypeInfoFactory.timestampTypeInfo, new Timestamp(100)); String serialized = Utilities.serializeExpression(constant); ExprNodeConstantDesc deserilized = (ExprNodeConstantDesc) Utilities.deserializeExpression(serialized, new Configuration()); {noformat} logs error message {noformat} java.lang.InstantiationException: java.sql.Timestamp Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... {noformat} and makes NPE in final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2702) listPartitionsByFilter only supports string partitions for equals
[ https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716638#comment-13716638 ] Phabricator commented on HIVE-2702: --- ashutoshc has accepted the revision HIVE-2702 [jira] listPartitionsByFilter only supports string partitions. +1. Sergey, How did the tests go? REVISION DETAIL https://reviews.facebook.net/D11715 BRANCH HIVE-2702 ARCANIST PROJECT hive To: JIRA, ashutoshc, sershe listPartitionsByFilter only supports string partitions for equals - Key: HIVE-2702 URL: https://issues.apache.org/jira/browse/HIVE-2702 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: Aniket Mokashi Assignee: Sergey Shelukhin Fix For: 0.12.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, HIVE-2702.D11715.3.patch, HIVE-2702.patch, HIVE-2702-v0.patch listPartitionsByFilter supports only non-string partitions. This is because its explicitly specified in generateJDOFilterOverPartitions in ExpressionTree.java. //Can only support partitions whose types are string if( ! table.getPartitionKeys().get(partitionColumnIndex). getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) { throw new MetaException (Filtering is supported only on partition keys of type string); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4611) SMB joins fail based on bigtable selection policy.
[ https://issues.apache.org/jira/browse/HIVE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4611: - Status: Patch Available (was: Open) SMB joins fail based on bigtable selection policy. -- Key: HIVE-4611 URL: https://issues.apache.org/jira/browse/HIVE-4611 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.11.1 Attachments: HIVE-4611.2.patch, HIVE-4611.3.patch, HIVE-4611.4.patch, HIVE-4611.patch The default setting for hive.auto.convert.sortmerge.join.bigtable.selection.policy will choose the big table as the one with largest average partition size. However, this can result in a query failing because this policy conflicts with the big table candidates chosen for outer joins. This policy should just be a tie breaker and not have the ultimate say in the choice of tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4611) SMB joins fail based on bigtable selection policy.
[ https://issues.apache.org/jira/browse/HIVE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4611: - Status: Open (was: Patch Available) SMB joins fail based on bigtable selection policy. -- Key: HIVE-4611 URL: https://issues.apache.org/jira/browse/HIVE-4611 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.11.1 Attachments: HIVE-4611.2.patch, HIVE-4611.3.patch, HIVE-4611.4.patch, HIVE-4611.patch The default setting for hive.auto.convert.sortmerge.join.bigtable.selection.policy will choose the big table as the one with largest average partition size. However, this can result in a query failing because this policy conflicts with the big table candidates chosen for outer joins. This policy should just be a tie breaker and not have the ultimate say in the choice of tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716667#comment-13716667 ] Hive QA commented on HIVE-4825: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12593669/HIVE-4825.4.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2647 tests executed *Failed tests:* {noformat} org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/150/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/150/console Messages: {noformat} Executing org.apache.hive.ptest.execution.CleanupPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Separate MapredWork into MapWork and ReduceWork --- Key: HIVE-4825 URL: https://issues.apache.org/jira/browse/HIVE-4825 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch Right now all the information needed to run an MR job is captured in MapredWork. This class has aliases, tagging info, table descriptors etc. For Tez and MRR it will be useful to break this into map and reduce specific pieces. The separation is natural and I think has value in itself, it makes the code easier to understand. However, it will also allow us to reuse these abstractions in Tez where you'll have a graph of these instead of just 1M and 0-1R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
HIVE-4266 - Refactor HCatalog code to org.apache.hive.hcatalog
I'm planning to change the package name of all hcatalog classes sometime this week (as was promised for 0.12). This is likely to affect any outstanding hcatalog patches on trunk. Please try to have them checked in as soon as possible. Thanks, Eugene
[jira] [Commented] (HIVE-4222) Timestamp type constants cannot be deserialized in JDK 1.6 or less
[ https://issues.apache.org/jira/browse/HIVE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716901#comment-13716901 ] Hive QA commented on HIVE-4222: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12593736/HIVE-4222.D9681.3.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/151/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/151/console Messages: {noformat} Executing org.apache.hive.ptest.execution.CleanupPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests failed with: IllegalStateException: Too many bad hosts: 1.0% (10 / 10) is greater than threshold of 50% {noformat} This message is automatically generated. Timestamp type constants cannot be deserialized in JDK 1.6 or less -- Key: HIVE-4222 URL: https://issues.apache.org/jira/browse/HIVE-4222 Project: Hive Issue Type: Bug Components: Types Reporter: Navis Assignee: Navis Attachments: HIVE-4222.D9681.1.patch, HIVE-4222.D9681.2.patch, HIVE-4222.D9681.3.patch For example, {noformat} ExprNodeConstantDesc constant = new ExprNodeConstantDesc(TypeInfoFactory.timestampTypeInfo, new Timestamp(100)); String serialized = Utilities.serializeExpression(constant); ExprNodeConstantDesc deserilized = (ExprNodeConstantDesc) Utilities.deserializeExpression(serialized, new Configuration()); {noformat} logs error message {noformat} java.lang.InstantiationException: java.sql.Timestamp Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... {noformat} and makes NPE in final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4222) Timestamp type constants cannot be deserialized in JDK 1.6 or less
[ https://issues.apache.org/jira/browse/HIVE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716904#comment-13716904 ] Brock Noland commented on HIVE-4222: I kicked this off again. Interesting that we are seeing this twice today. Timestamp type constants cannot be deserialized in JDK 1.6 or less -- Key: HIVE-4222 URL: https://issues.apache.org/jira/browse/HIVE-4222 Project: Hive Issue Type: Bug Components: Types Reporter: Navis Assignee: Navis Attachments: HIVE-4222.D9681.1.patch, HIVE-4222.D9681.2.patch, HIVE-4222.D9681.3.patch For example, {noformat} ExprNodeConstantDesc constant = new ExprNodeConstantDesc(TypeInfoFactory.timestampTypeInfo, new Timestamp(100)); String serialized = Utilities.serializeExpression(constant); ExprNodeConstantDesc deserilized = (ExprNodeConstantDesc) Utilities.deserializeExpression(serialized, new Configuration()); {noformat} logs error message {noformat} java.lang.InstantiationException: java.sql.Timestamp Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... {noformat} and makes NPE in final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4684) Query with filter constant on left of = and column expression on right does not vectorize
[ https://issues.apache.org/jira/browse/HIVE-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4684: --- Resolution: Fixed Fix Version/s: vectorization-branch Status: Resolved (was: Patch Available) Committed to branch. Thanks, Jitendra! Query with filter constant on left of = and column expression on right does not vectorize --- Key: HIVE-4684 URL: https://issues.apache.org/jira/browse/HIVE-4684 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Eric Hanson Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: Hive-4684.0.patch, Hive-4684.1.patch, HIVE-4684.1.patch, HIVE-4684.2.patch, HIVE-4684.3.patch select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Does not go down the vectorization path. Output: hive select dmachineid from factsqlengineam_vec_orc where 1073 = dmachineid + 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Validating if vectorized execution is applicable Cannot vectorize the plan: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hiv e.ql.exec.vector.expressions.gen.FilterLongScalarEqualLongColumn Starting Job = job_201306061504_0038, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201306061504_0038 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -kill job_201306061504_0038 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 0 2013-06-07 10:25:30,932 Stage-1 map = 0%, reduce = 0% 2013-06-07 10:25:39,953 Stage-1 map = 25%, reduce = 0% 2013-06-07 10:25:42,959 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec 2013-06-07 10:25:43,962 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 8.172 sec ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4222) Timestamp type constants cannot be deserialized in JDK 1.6 or less
[ https://issues.apache.org/jira/browse/HIVE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717501#comment-13717501 ] Hive QA commented on HIVE-4222: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12593736/HIVE-4222.D9681.3.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/152/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/152/console Messages: {noformat} Executing org.apache.hive.ptest.execution.CleanupPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests failed with: IllegalStateException: Too many bad hosts: 1.0% (10 / 10) is greater than threshold of 50% {noformat} This message is automatically generated. Timestamp type constants cannot be deserialized in JDK 1.6 or less -- Key: HIVE-4222 URL: https://issues.apache.org/jira/browse/HIVE-4222 Project: Hive Issue Type: Bug Components: Types Reporter: Navis Assignee: Navis Attachments: HIVE-4222.D9681.1.patch, HIVE-4222.D9681.2.patch, HIVE-4222.D9681.3.patch For example, {noformat} ExprNodeConstantDesc constant = new ExprNodeConstantDesc(TypeInfoFactory.timestampTypeInfo, new Timestamp(100)); String serialized = Utilities.serializeExpression(constant); ExprNodeConstantDesc deserilized = (ExprNodeConstantDesc) Utilities.deserializeExpression(serialized, new Configuration()); {noformat} logs error message {noformat} java.lang.InstantiationException: java.sql.Timestamp Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... {noformat} and makes NPE in final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4900) Fix the mismatched column names in package.jdo
[ https://issues.apache.org/jira/browse/HIVE-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717507#comment-13717507 ] Ashutosh Chauhan commented on HIVE-4900: I see. Than is this sufficient for us to upgrade to DN 3.x ? If yes, than it seems there is no need to Db upgrade (atleast for mysql). Is that correct? Fix the mismatched column names in package.jdo -- Key: HIVE-4900 URL: https://issues.apache.org/jira/browse/HIVE-4900 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4900.1.patch, HIVE-4900.2.patch, HIVE-4900.patch There are several errors in DataNucleus O-R mapping file, package.jdo, which are not complained by the existing DN version. These errors may be subject to future DN complaint (as experienced in HIVE-3632 and HIVE-2084). However, it is still better if we fix these errors as it also creates some confusion in the community. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4222) Timestamp type constants cannot be deserialized in JDK 1.6 or less
[ https://issues.apache.org/jira/browse/HIVE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717514#comment-13717514 ] Brock Noland commented on HIVE-4222: Spot instance prices are instance today for the instance type we use. I've created HIVE-4920 to improve our handling of this. Timestamp type constants cannot be deserialized in JDK 1.6 or less -- Key: HIVE-4222 URL: https://issues.apache.org/jira/browse/HIVE-4222 Project: Hive Issue Type: Bug Components: Types Reporter: Navis Assignee: Navis Attachments: HIVE-4222.D9681.1.patch, HIVE-4222.D9681.2.patch, HIVE-4222.D9681.3.patch For example, {noformat} ExprNodeConstantDesc constant = new ExprNodeConstantDesc(TypeInfoFactory.timestampTypeInfo, new Timestamp(100)); String serialized = Utilities.serializeExpression(constant); ExprNodeConstantDesc deserilized = (ExprNodeConstantDesc) Utilities.deserializeExpression(serialized, new Configuration()); {noformat} logs error message {noformat} java.lang.InstantiationException: java.sql.Timestamp Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... {noformat} and makes NPE in final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4920) PTest2 spot instances should fall back on c1.xlarge and then on-demand instances
Brock Noland created HIVE-4920: -- Summary: PTest2 spot instances should fall back on c1.xlarge and then on-demand instances Key: HIVE-4920 URL: https://issues.apache.org/jira/browse/HIVE-4920 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Today the price for m1.xlarge instances has been varying dramatically. We should fall back on c1.xlarge (which is more powerful and is cheaper at present) and then on on-demand instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Any Test Program for Testing ORCFile Code?
Hi All, Is there any test program for testing ORCFile? If yes, can someone please give instructions on how to invoke it? I simply want to step through the code to understand it quickly. Best, Riini
Re: Review Request 12795: [HIVE-4827] Merge a Map-only job to its following MapReduce job with multiple inputs
On July 23, 2013, 2:22 a.m., Gunther Hagleitner wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java, line 4 https://reviews.apache.org/r/12795/diff/2/?file=324291#file324291line4 Don't we still need the copyright? We do not need the header. This copyright line was from the trunk. On July 23, 2013, 2:22 a.m., Gunther Hagleitner wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java, line 420 https://reviews.apache.org/r/12795/diff/2/?file=324292#file324292line420 Why is it better to throw exception here than simply return? If we only expect a TableScanOperator at here, I think it may be better to throw an exception instead of swallowing the error. On July 23, 2013, 2:22 a.m., Gunther Hagleitner wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java, line 446 https://reviews.apache.org/r/12795/diff/2/?file=324292#file324292line446 Why is this? Should work regardless, no? This part is from trunk. I will take a look and see why we need it - Yin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12795/#review23671 --- On July 22, 2013, 4:19 a.m., Yin Huai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12795/ --- (Updated July 22, 2013, 4:19 a.m.) Review request for hive. Bugs: HIVE-4827 https://issues.apache.org/jira/browse/HIVE-4827 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-4827 Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 66b84ff ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java f98878c ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 7cbb1ff ql/src/test/queries/clientpositive/correlationoptimizer7.q 9b18972 ql/src/test/queries/clientpositive/multiMapJoin2.q PRE-CREATION ql/src/test/results/clientpositive/auto_join33.q.out 8fc0e84 ql/src/test/results/clientpositive/correlationoptimizer1.q.out db3bd78 ql/src/test/results/clientpositive/correlationoptimizer3.q.out cebddff ql/src/test/results/clientpositive/correlationoptimizer4.q.out 285a54f ql/src/test/results/clientpositive/correlationoptimizer6.q.out c40a786 ql/src/test/results/clientpositive/correlationoptimizer7.q.out ea54431 ql/src/test/results/clientpositive/multiMapJoin1.q.out 3b3eb3f ql/src/test/results/clientpositive/multiMapJoin2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/12795/diff/ Testing --- Running tests. Thanks, Yin Huai
[jira] [Commented] (HIVE-4900) Fix the mismatched column names in package.jdo
[ https://issues.apache.org/jira/browse/HIVE-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717530#comment-13717530 ] Xuefu Zhang commented on HIVE-4900: --- That's correct. I don't think any upgrade is needed for the purpose of DN upgrade and HIVE-4900.2.patch is needed for that. As mentioned in 2084, there needs an upgrade for comment column for table TYPE_FIELDS and COLUMNS_V2, which needs to be brought up from current 256 to 4000. I'll create a separate JIRA for that, as it's irrelevant to DN upgrade. Fix the mismatched column names in package.jdo -- Key: HIVE-4900 URL: https://issues.apache.org/jira/browse/HIVE-4900 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4900.1.patch, HIVE-4900.2.patch, HIVE-4900.patch There are several errors in DataNucleus O-R mapping file, package.jdo, which are not complained by the existing DN version. These errors may be subject to future DN complaint (as experienced in HIVE-3632 and HIVE-2084). However, it is still better if we fix these errors as it also creates some confusion in the community. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4921) Upgrade COMMENT column size in table COLUMNS_V2 and TYPE_FIELDS from 256 to 4000
Xuefu Zhang created HIVE-4921: - Summary: Upgrade COMMENT column size in table COLUMNS_V2 and TYPE_FIELDS from 256 to 4000 Key: HIVE-4921 URL: https://issues.apache.org/jira/browse/HIVE-4921 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.11.0, 0.10.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang There are three tables in Hive metastore schema having COMMENT COLUMN: PARTITIION_KEYS, COLUMNS_V2, and TYPE_FIELDS, and their sizes are different. PARTITIION_KEYS.COMMENT has a size of 4000. To be consistent, and to make it more reasonable, we need to promote the column in other two tables from the current size (256) to 4000. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4900) Fix the mismatched column names in package.jdo
[ https://issues.apache.org/jira/browse/HIVE-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717545#comment-13717545 ] Ashutosh Chauhan commented on HIVE-4900: Make sense. Lets limit this jira for this fix only. This looks good to me. Do you plan to run more tests or is this ready to get in? Fix the mismatched column names in package.jdo -- Key: HIVE-4900 URL: https://issues.apache.org/jira/browse/HIVE-4900 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4900.1.patch, HIVE-4900.2.patch, HIVE-4900.patch There are several errors in DataNucleus O-R mapping file, package.jdo, which are not complained by the existing DN version. These errors may be subject to future DN complaint (as experienced in HIVE-3632 and HIVE-2084). However, it is still better if we fix these errors as it also creates some confusion in the community. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4920) PTest2 spot instances should fall back on c1.xlarge and then on-demand instances
[ https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717549#comment-13717549 ] Edward Capriolo commented on HIVE-4920: --- What is the daily cost for running? With the backlog of patches we have we may be running for a bit. PTest2 spot instances should fall back on c1.xlarge and then on-demand instances Key: HIVE-4920 URL: https://issues.apache.org/jira/browse/HIVE-4920 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Critical Today the price for m1.xlarge instances has been varying dramatically. We should fall back on c1.xlarge (which is more powerful and is cheaper at present) and then on on-demand instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4900) Fix the mismatched column names in package.jdo
[ https://issues.apache.org/jira/browse/HIVE-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717557#comment-13717557 ] Ashutosh Chauhan commented on HIVE-4900: Cool. +1 Fix the mismatched column names in package.jdo -- Key: HIVE-4900 URL: https://issues.apache.org/jira/browse/HIVE-4900 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4900.1.patch, HIVE-4900.2.patch, HIVE-4900.patch There are several errors in DataNucleus O-R mapping file, package.jdo, which are not complained by the existing DN version. These errors may be subject to future DN complaint (as experienced in HIVE-3632 and HIVE-2084). However, it is still better if we fix these errors as it also creates some confusion in the community. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Any Test Program for Testing ORCFile Code?
Hi Riini The source code for ORC is in org.apache.hadoop.hive.ql.io.orc package inside ql/src/java folder. The test cases corresponding to ORC are in ql/src/test folder under the same package structure. TestOrcFile should be a good starting point (for debugging) which reads/writes nested struct in ORC format. Thanks Prasanth On Jul 23, 2013, at 12:32 PM, Rini Kaushik kaush...@us.ibm.com wrote: Hi All, Is there any test program for testing ORCFile? If yes, can someone please give instructions on how to invoke it? I simply want to step through the code to understand it quickly. Best, Riini
[jira] [Commented] (HIVE-4900) Fix the mismatched column names in package.jdo
[ https://issues.apache.org/jira/browse/HIVE-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717555#comment-13717555 ] Xuefu Zhang commented on HIVE-4900: --- I completed my testing and I think it's ready. Fix the mismatched column names in package.jdo -- Key: HIVE-4900 URL: https://issues.apache.org/jira/browse/HIVE-4900 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4900.1.patch, HIVE-4900.2.patch, HIVE-4900.patch There are several errors in DataNucleus O-R mapping file, package.jdo, which are not complained by the existing DN version. These errors may be subject to future DN complaint (as experienced in HIVE-3632 and HIVE-2084). However, it is still better if we fix these errors as it also creates some confusion in the community. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4920) PTest2 spot instances should fall back on c1.xlarge and then on-demand instances
[ https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717581#comment-13717581 ] Brock Noland commented on HIVE-4920: It varies quite a bit but I think it averages to about $1-2 per hour. The past few days there have been large spikes in the prices of spot instances which results in our slave instances being terminated. See attached. c1.xlarge seems to be more stable but also in less supply. I think that I will change to using c1.xlarge and then if fallback to m1.xlarge and then to on-demand instances. PTest2 spot instances should fall back on c1.xlarge and then on-demand instances Key: HIVE-4920 URL: https://issues.apache.org/jira/browse/HIVE-4920 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Critical Attachments: Screen Shot 2013-07-23 at 3.35.00 PM.png Today the price for m1.xlarge instances has been varying dramatically. We should fall back on c1.xlarge (which is more powerful and is cheaper at present) and then on on-demand instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4920) PTest2 spot instances should fall back on c1.xlarge and then on-demand instances
[ https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4920: --- Attachment: Screen Shot 2013-07-23 at 3.35.00 PM.png PTest2 spot instances should fall back on c1.xlarge and then on-demand instances Key: HIVE-4920 URL: https://issues.apache.org/jira/browse/HIVE-4920 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Priority: Critical Attachments: Screen Shot 2013-07-23 at 3.35.00 PM.png Today the price for m1.xlarge instances has been varying dramatically. We should fall back on c1.xlarge (which is more powerful and is cheaper at present) and then on on-demand instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717596#comment-13717596 ] Owen O'Malley commented on HIVE-4123: - {quote} 1) In the current implementation, I kept the delta base field as optional (used only for fixed delta runs) and zigzag encoded the delta blob so that we don't have to deal with sign of the deltas. I can change delta base field to mandatory field to store the base (absolute min) value of delta values and zigzag encode it. With base value and delta base value, we should be able to identify if the sequence is monotonically increasing or decreasing and also we can identify the sign of the delta values. I hope this is what you are looking for. Please correct me if my understanding is wrong. {quote} I think it will be worthwhile always having the delta base and keeping the additional delta as an unsigned remainder. {quote} 2) is there any way we can reuse the Orc's MAJOR and MINOR version as supported in HIVE-4724 to figure out if we need use new integer encoding or old integer encoding? {quote} Yeah, I need to add more framework for that code. I'm leaning toward passing in a factory object that creates the right integer encoder. The RLE encoding for ORC can be improved Key: HIVE-4123 URL: https://issues.apache.org/jira/browse/HIVE-4123 Project: Hive Issue Type: New Feature Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth J Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, ORC-Compression-Ratio-Comparison.xlsx The run length encoding of integers can be improved: * tighter bit packing * allow delta encoding * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4922) create template for string scalar compared with string column
Eric Hanson created HIVE-4922: - Summary: create template for string scalar compared with string column Key: HIVE-4922 URL: https://issues.apache.org/jira/browse/HIVE-4922 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Create a template to generate classes to handle comparisons with a scalar on the left and a column on the right. This allows queries similar to the following to run vectorized: select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'M' l_shipmode; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-4922) create template for string scalar compared with string column
[ https://issues.apache.org/jira/browse/HIVE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-4922 started by Eric Hanson. create template for string scalar compared with string column - Key: HIVE-4922 URL: https://issues.apache.org/jira/browse/HIVE-4922 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Create a template to generate classes to handle comparisons with a scalar on the left and a column on the right. This allows queries similar to the following to run vectorized: select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'M' l_shipmode; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4922) create template for string scalar compared with string column
[ https://issues.apache.org/jira/browse/HIVE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717628#comment-13717628 ] Eric Hanson commented on HIVE-4922: --- I verified that comparisons with a scalar on the left and a column on the right run end-to-end, using these queries, which all run vectorized: select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'MAIL' = l_shipmode; select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'MAIL' l_shipmode; select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'MAIL' = l_shipmode; select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'MAIL' = l_shipmode; select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'MAIL' l_shipmode; select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'MAIL' l_shipmode; create template for string scalar compared with string column - Key: HIVE-4922 URL: https://issues.apache.org/jira/browse/HIVE-4922 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Create a template to generate classes to handle comparisons with a scalar on the left and a column on the right. This allows queries similar to the following to run vectorized: select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'M' l_shipmode; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4922) create template for string scalar compared with string column
[ https://issues.apache.org/jira/browse/HIVE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4922: -- Status: Patch Available (was: In Progress) create template for string scalar compared with string column - Key: HIVE-4922 URL: https://issues.apache.org/jira/browse/HIVE-4922 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4922.1.patch Create a template to generate classes to handle comparisons with a scalar on the left and a column on the right. This allows queries similar to the following to run vectorized: select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'M' l_shipmode; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4922) create template for string scalar compared with string column
[ https://issues.apache.org/jira/browse/HIVE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4922: -- Attachment: HIVE-4922.1.patch create template for string scalar compared with string column - Key: HIVE-4922 URL: https://issues.apache.org/jira/browse/HIVE-4922 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4922.1.patch Create a template to generate classes to handle comparisons with a scalar on the left and a column on the right. This allows queries similar to the following to run vectorized: select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'M' l_shipmode; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4922) create template for string scalar compared with string column
[ https://issues.apache.org/jira/browse/HIVE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717630#comment-13717630 ] Eric Hanson commented on HIVE-4922: --- This patch was created after the following patches were applied, so to be safe, it should wait to go in until they've been applied first. HIVE-4684.3.patch HIVE-4884-.patch Hive-4909.0.patch create template for string scalar compared with string column - Key: HIVE-4922 URL: https://issues.apache.org/jira/browse/HIVE-4922 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4922.1.patch Create a template to generate classes to handle comparisons with a scalar on the left and a column on the right. This allows queries similar to the following to run vectorized: select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'M' l_shipmode; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4922) create template for string scalar compared with string column
[ https://issues.apache.org/jira/browse/HIVE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717654#comment-13717654 ] Hive QA commented on HIVE-4922: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12593776/HIVE-4922.1.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/153/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/153/console Messages: {noformat} Executing org.apache.hive.ptest.execution.CleanupPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-153/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/org/apache/hadoop/hive/ql/exec/TestUtilities.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf build hcatalog/build hcatalog/core/build hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build hcatalog/hcatalog-pig-adapter/build common/src/gen + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1506302. At revision 1506302. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' {noformat} This message is automatically generated. create template for string scalar compared with string column - Key: HIVE-4922 URL: https://issues.apache.org/jira/browse/HIVE-4922 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4922.1.patch Create a template to generate classes to handle comparisons with a scalar on the left and a column on the right. This allows queries similar to the following to run vectorized: select l_orderkey, l_shipmode from lineitem_orc where l_orderkey = 1 and 'M' l_shipmode; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4624) Integrate Vectorzied Substr into Vectorized QE
[ https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717657#comment-13717657 ] Eric Hanson commented on HIVE-4624: --- Tim will try to get this done by Thurs or else explicitly give it to somebody else to finish. Integrate Vectorzied Substr into Vectorized QE -- Key: HIVE-4624 URL: https://issues.apache.org/jira/browse/HIVE-4624 Project: Hive Issue Type: Sub-task Reporter: Timothy Chen Assignee: Timothy Chen Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it can be leveraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4512) The vectorized plan is not picking right expression class for string concatenation.
[ https://issues.apache.org/jira/browse/HIVE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson reassigned HIVE-4512: - Assignee: Eric Hanson (was: Jitendra Nath Pandey) The vectorized plan is not picking right expression class for string concatenation. --- Key: HIVE-4512 URL: https://issues.apache.org/jira/browse/HIVE-4512 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson The vectorized plan is not picking right expression class for string concatenation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 12824: [HIVE-4911] Enable QOP configuration for Hive Server 2 thrift transport
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12824/#review23711 --- data/conf/hive-site.xml https://reviews.apache.org/r/12824/#comment47589 This change should go into conf/hive-default.xml.template . data/conf/hive-site.xml is meant to be used for overriding config parameters for the tests. In this case as default value is being used, this file does not need changing. jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/12824/#comment47597 the HIVE_AUTH_TYPE env variable is called auth. Should we use something more descriptive like sasl.qop as the variable that sets the QOP level. jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/12824/#comment47590 It is a good general practice to chain the exceptions. - throw new SQLException(Invalid + HIVE_AUTH_TYPE + parameter. + e.getMessage(), 42000, e); service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java https://reviews.apache.org/r/12824/#comment47596 I think hadoop.rpc.protection being set to a higher level than hive.server2.thrift.rpc.protection does not make sense in most situations (you would want to have more security in the transport that is likely to be more unsecure. THe HS2 - client transport could be over a corporate wide wi-fi network) Should we warn if such a configuration is seen ? shims/src/common-secure/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java https://reviews.apache.org/r/12824/#comment47595 This function is called from hive metastore client. Using SaslRpcServer.SASL_PROPS here means that setting hadoop.rpc.protection will determine the QOP level, if we make a call to SaslRpcServer.init(conf) from anywhere in the code. But that function is not being called. I think it makes sense to use hadoop.rpc.protection for metastore QOP, since metastore usually not exposed 'outside' the cluster unlike hive server2. It is often viewed as something 'inside the cluster'. Should we change this function to take in a configuration object and use that to call SaslRpcServer.init(conf) ? - Thejas Nair On July 22, 2013, 8:56 p.m., Arup Malakar wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12824/ --- (Updated July 22, 2013, 8:56 p.m.) Review request for hive. Bugs: HIVE-4911 https://issues.apache.org/jira/browse/HIVE-4911 Repository: hive-git Description --- The QoP for hive server 2 should be configurable to enable encryption. A new configuration should be exposed hive.server2.thrift.rpc.protection. This would give greater control configuring hive server 2 service. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 11c31216495d0c4e454f2627af5c93a9f270b1fe data/conf/hive-site.xml 4e6ff16135833da1a4df12a12a6fe59ad4f870ba jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 00f43511b478c687b7811fc8ad66af2b507a3626 service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java 1809e1b26ceee5de14a354a0e499aa8c0ab793bf service/src/java/org/apache/hive/service/auth/KerberosSaslHelper.java 379dafb8377aed55e74f0ae18407996bb9e1216f service/src/java/org/apache/hive/service/auth/SaslQOP.java PRE-CREATION shims/src/common-secure/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java 777226f8da0af2235d4294cd6a676fa8192c89e4 shims/src/common/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge.java 9b0ec0a75563b41339e6fc747556440fdf83e31e Diff: https://reviews.apache.org/r/12824/diff/ Testing --- Thanks, Arup Malakar
[jira] [Commented] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717661#comment-13717661 ] Gunther Hagleitner commented on HIVE-4825: -- Surprised to see this hcat test fail - shouldn't be affected by the changes. Ran it in isolation a few times and also ran all hcat tests combined. Couldn't reproduce the issue. Fluke? Separate MapredWork into MapWork and ReduceWork --- Key: HIVE-4825 URL: https://issues.apache.org/jira/browse/HIVE-4825 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch Right now all the information needed to run an MR job is captured in MapredWork. This class has aliases, tagging info, table descriptors etc. For Tez and MRR it will be useful to break this into map and reduce specific pieces. The separation is natural and I think has value in itself, it makes the code easier to understand. However, it will also allow us to reuse these abstractions in Tez where you'll have a graph of these instead of just 1M and 0-1R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717667#comment-13717667 ] Brock Noland commented on HIVE-4825: Yes I believe that one is flaky. I added it to HIVE-4851 Separate MapredWork into MapWork and ReduceWork --- Key: HIVE-4825 URL: https://issues.apache.org/jira/browse/HIVE-4825 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch Right now all the information needed to run an MR job is captured in MapredWork. This class has aliases, tagging info, table descriptors etc. For Tez and MRR it will be useful to break this into map and reduce specific pieces. The separation is natural and I think has value in itself, it makes the code easier to understand. However, it will also allow us to reuse these abstractions in Tez where you'll have a graph of these instead of just 1M and 0-1R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4851) Fix flaky tests
[ https://issues.apache.org/jira/browse/HIVE-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4851: --- Description: I see the following tests fail quite often: * TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 * TestOrcHCatLoader.testReadDataBasic * TestMinimrCliDriver.testCliDriver_bucketmpjoin6 * TestNotificationListener.testAMQListener This one is less often, but still fails randomly: * TestMinimrCliDriver.testCliDriver_bucket4 * TestHCatHiveCompatibility.testUnpartedReadWrite * TestHCatLoader.testReadPartitionedBasic * TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat * TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask was: I see the following tests fail quite often: * TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 * TestOrcHCatLoader.testReadDataBasic * TestMinimrCliDriver.testCliDriver_bucketmpjoin6 * TestNotificationListener.testAMQListener This one is less often, but still fails randomly: * TestMinimrCliDriver.testCliDriver_bucket4 * TestHCatHiveCompatibility.testUnpartedReadWrite * TestHCatLoader.testReadPartitionedBasic * TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat Fix flaky tests --- Key: HIVE-4851 URL: https://issues.apache.org/jira/browse/HIVE-4851 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland I see the following tests fail quite often: * TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 * TestOrcHCatLoader.testReadDataBasic * TestMinimrCliDriver.testCliDriver_bucketmpjoin6 * TestNotificationListener.testAMQListener This one is less often, but still fails randomly: * TestMinimrCliDriver.testCliDriver_bucket4 * TestHCatHiveCompatibility.testUnpartedReadWrite * TestHCatLoader.testReadPartitionedBasic * TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat * TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 12824: [HIVE-4911] Enable QOP configuration for Hive Server 2 thrift transport
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12824/#review23722 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/12824/#comment47598 should we just call this hive.server2.thrift.sasl.qop ? That seems more self describing. - Thejas Nair On July 22, 2013, 8:56 p.m., Arup Malakar wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12824/ --- (Updated July 22, 2013, 8:56 p.m.) Review request for hive. Bugs: HIVE-4911 https://issues.apache.org/jira/browse/HIVE-4911 Repository: hive-git Description --- The QoP for hive server 2 should be configurable to enable encryption. A new configuration should be exposed hive.server2.thrift.rpc.protection. This would give greater control configuring hive server 2 service. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 11c31216495d0c4e454f2627af5c93a9f270b1fe data/conf/hive-site.xml 4e6ff16135833da1a4df12a12a6fe59ad4f870ba jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 00f43511b478c687b7811fc8ad66af2b507a3626 service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java 1809e1b26ceee5de14a354a0e499aa8c0ab793bf service/src/java/org/apache/hive/service/auth/KerberosSaslHelper.java 379dafb8377aed55e74f0ae18407996bb9e1216f service/src/java/org/apache/hive/service/auth/SaslQOP.java PRE-CREATION shims/src/common-secure/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java 777226f8da0af2235d4294cd6a676fa8192c89e4 shims/src/common/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge.java 9b0ec0a75563b41339e6fc747556440fdf83e31e Diff: https://reviews.apache.org/r/12824/diff/ Testing --- Thanks, Arup Malakar
[jira] [Commented] (HIVE-4911) Enable QOP configuration for Hive Server 2 thrift transport
[ https://issues.apache.org/jira/browse/HIVE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717669#comment-13717669 ] Thejas M Nair commented on HIVE-4911: - [~amalakar] I added some review comments in review board link. +1 for having a separate config flag that enables the QOP for hive server2. HS2 - client connection is usually more vulnerable compared to the network traffic within a hadoop cluster, as the HS2 client is likely to be connecting over a corporate wide network. [~brocknoland] The patch would not work for HMS, that would new some more change. (added a comment about that in review). But I am not sure if that needs to be part of same jira. I don't think it makes sense to use the same config param to set the SASL QOP level for metastore. Should we just use hadoop.rpc.protection for that, as it is usually considered as 'inside the cluster' (as opposed to HS2 which is like a 'gateway server') Enable QOP configuration for Hive Server 2 thrift transport --- Key: HIVE-4911 URL: https://issues.apache.org/jira/browse/HIVE-4911 Project: Hive Issue Type: New Feature Reporter: Arup Malakar Assignee: Arup Malakar Attachments: HIVE-4911-trunk-0.patch The QoP for hive server 2 should be configurable to enable encryption. A new configuration should be exposed hive.server2.thrift.rpc.protection. This would give greater control configuring hive server 2 service. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4915) unit tests fail on windows because of difference in input file size
[ https://issues.apache.org/jira/browse/HIVE-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4915: Attachment: HIVE-4915.1.patch HIVE-4915.1.patch - update .gitattributes file to always checkout the test .dat files with unix style newlines unit tests fail on windows because of difference in input file size --- Key: HIVE-4915 URL: https://issues.apache.org/jira/browse/HIVE-4915 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4915.1.patch Several qfile based tests fail on windows because in the output of explain extended, the total file size of input files shown is different on windows. This is because by default text files on windows are checked out with two char line endings, and *.dat files used as input files for the tables are considered as text files. So for every line in the .dat file, the size of the file is larger by 1 byte on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4915) unit tests fail on windows because of difference in input file size
[ https://issues.apache.org/jira/browse/HIVE-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4915: Status: Patch Available (was: Open) unit tests fail on windows because of difference in input file size --- Key: HIVE-4915 URL: https://issues.apache.org/jira/browse/HIVE-4915 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4915.1.patch Several qfile based tests fail on windows because in the output of explain extended, the total file size of input files shown is different on windows. This is because by default text files on windows are checked out with two char line endings, and *.dat files used as input files for the tables are considered as text files. So for every line in the .dat file, the size of the file is larger by 1 byte on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717688#comment-13717688 ] Brock Noland commented on HIVE-4388: Facts: * HBase only plans on publishing 0.95/0.96 hadoop2 artifacts * HBase 0.95/0.96 makes backwards incompatible changes * HBase 0.95/0.96 changes the coprocessor interface dramatically Based on these facts I feel it would be very difficult to support both 0.94 and 0.95/0.96 with the same source code. I see two options: # Move the hbase stuff in a versioned module similar to the hadoop shim # Upgrade trunk to 0.96 I propose we upgrade trunk to 0.95/0.96 and move on with our lives. Supporting two versions of hbase in addition to three versions of hadoop is going to be ugly quick. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4808) WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly
[ https://issues.apache.org/jira/browse/HIVE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4808: - Attachment: HIVE-4808.1.patch Added a test for this case. Ran Templeton e2e tests. fork.factor.group=3 and fork.factor.conf.file=6 the suite runs in 11 minutes. Added support for timeout_seconds property in .conf files to specify custom timeout. WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly --- Key: HIVE-4808 URL: https://issues.apache.org/jira/browse/HIVE-4808 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4808.1.patch, HIVE-4808.patch (set mapred.task.timeout=7) curl -i -d user.name=ekoifman \ -d jar=/user/ekoifman/webhcate2e/hexamples.jar \ -d class=sleep \ -d arg=-mt \ -d arg=5 \ -d statusdir=/tmp \ 'http://localhost:50111/templeton/v1/mapreduce/jar' The TempletonControllerJob gets retried 4 times (Thus there are 4 SleepJob invocations) with message that it was killed due to inactivity. hexamples.jar = hadoop-examples-*.jar -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717703#comment-13717703 ] Ashutosh Chauhan commented on HIVE-4838: [~brocknoland] One of the item listed in description is: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. Have you attacked this in this patch? If yes, how did you fix it. I haven't dived into the patch to figure that out yet. Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4831) QTestUtil based test exiting abnormally on windows fails startup of other QTestUtil tests
[ https://issues.apache.org/jira/browse/HIVE-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4831: Status: Patch Available (was: Open) QTestUtil based test exiting abnormally on windows fails startup of other QTestUtil tests - Key: HIVE-4831 URL: https://issues.apache.org/jira/browse/HIVE-4831 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4831.1.patch, HIVE-4831.2.patch QTestUtil tests start mini zookeeper cluster. If it exits abnormally (eg timeout), it fails to stop the zookeeper mini cluster. On Windows when the process is still running the files can't be deleted, and as a result the new zookeeper cluster started by a new QFileUtil based test case fails to start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717707#comment-13717707 ] Ashutosh Chauhan commented on HIVE-4388: I am also of same opinion, lets move forward and upgrade trunk to 0.96. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717716#comment-13717716 ] Brock Noland commented on HIVE-4838: Hey, Yes I have. I'll upload an updated patch here in a few minutes. The current code is using this static code because by using java serialization there is no way to pass any context information down to the class when the read/write methods are being called. In the new patch I define my own read/write methods (example below). {noformat} public void read(MapJoinObjectSerDeContext context, ObjectInputStream in, Writable container) throws IOException, SerDeException { {noformat} and use those to serialize/deserialize the objects. Specifically in the new patch MapJoinRowContainer.read/write, MapJoinTableContainerSerDe.load/persist and MapJoinKey.read/write will be interesting. Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4838: --- Attachment: HIVE-4838.patch Rebased after HIVE-4845 was committed. Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717719#comment-13717719 ] Brock Noland commented on HIVE-4838: Updated review https://reviews.facebook.net/D11679 Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4822) implement vectorized math functions
[ https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4822: -- Attachment: HIVE-4822.5.patch implement vectorized math functions --- Key: HIVE-4822 URL: https://issues.apache.org/jira/browse/HIVE-4822 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4822.1.patch, HIVE-4822.4.patch, HIVE-4822.5.patch Implement vectorized support for the all the built-in math functions. This includes implementing the vectorized operation, and tying it all together in VectorizationContext so it runs end-to-end. These functions include: round(Col) Round(Col, N) Floor(Col) Ceil(Col) Rand(), Rand(seed) Exp(Col) Ln(Col) Log10(Col) Log2(Col) Log(base, Col) Pow(col, p), Power(col, p) Sqrt(Col) Bin(Col) Hex(Col) Unhex(Col) Conv(Col, from_base, to_base) Abs(Col) Pmod(arg1, arg2) Sin(Col) Asin(Col) Cos(Col) ACos(Col) Atan(Col) Degrees(Col) Radians(Col) Positive(Col) Negative(Col) Sign(Col) E() Pi() To reduce the total code volume, do an implicit type cast from non-double input types to double. Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so reuse code for those as appropriate. Try to call the function directly in the inner loop and avoid new() or expensive operations, as appropriate. Templatize the code where appropriate, e.g. all the unary function of form DOUBLE func(DOUBLE) can probably be done with a template. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working
[ https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717747#comment-13717747 ] Phabricator commented on HIVE-3926: --- hagleitn has commented on the revision HIVE-3926 [jira] PPD on virtual column of partitioned table is not working. Some minor comments + request for more info. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/Driver.java:600 tableScanOperator was actually a clearer name, wasn't it? ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java:108 I think this is redundant. o instanceof MapInputPath is false if o == null ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java:134 can you please add a javadoc comment here? ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java:18 There are a lot of changes in MapOperator. I've scan through them and it seems most (all?) are just cleaning up the operator (which is great). Is that correct? If not can you please point out what the important changes are? REVISION DETAIL https://reviews.facebook.net/D8121 To: JIRA, navis Cc: hagleitn PPD on virtual column of partitioned table is not working - Key: HIVE-3926 URL: https://issues.apache.org/jira/browse/HIVE-3926 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3926.D8121.1.patch, HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch {code} select * from src where BLOCK__OFFSET__INSIDE__FILE100; {code} is working, but {code} select * from srcpart where BLOCK__OFFSET__INSIDE__FILE100; {code} throws SemanticException. Disabling PPD makes it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4822) implement vectorized math functions
[ https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4822: -- Attachment: (was: HIVE-4822.5.patch) implement vectorized math functions --- Key: HIVE-4822 URL: https://issues.apache.org/jira/browse/HIVE-4822 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4822.1.patch, HIVE-4822.4.patch, HIVE-4822.5-vectorization.patch Implement vectorized support for the all the built-in math functions. This includes implementing the vectorized operation, and tying it all together in VectorizationContext so it runs end-to-end. These functions include: round(Col) Round(Col, N) Floor(Col) Ceil(Col) Rand(), Rand(seed) Exp(Col) Ln(Col) Log10(Col) Log2(Col) Log(base, Col) Pow(col, p), Power(col, p) Sqrt(Col) Bin(Col) Hex(Col) Unhex(Col) Conv(Col, from_base, to_base) Abs(Col) Pmod(arg1, arg2) Sin(Col) Asin(Col) Cos(Col) ACos(Col) Atan(Col) Degrees(Col) Radians(Col) Positive(Col) Negative(Col) Sign(Col) E() Pi() To reduce the total code volume, do an implicit type cast from non-double input types to double. Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so reuse code for those as appropriate. Try to call the function directly in the inner loop and avoid new() or expensive operations, as appropriate. Templatize the code where appropriate, e.g. all the unary function of form DOUBLE func(DOUBLE) can probably be done with a template. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4822) implement vectorized math functions
[ https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4822: -- Attachment: HIVE-4822.5-vectorization.patch implement vectorized math functions --- Key: HIVE-4822 URL: https://issues.apache.org/jira/browse/HIVE-4822 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4822.1.patch, HIVE-4822.4.patch, HIVE-4822.5-vectorization.patch Implement vectorized support for the all the built-in math functions. This includes implementing the vectorized operation, and tying it all together in VectorizationContext so it runs end-to-end. These functions include: round(Col) Round(Col, N) Floor(Col) Ceil(Col) Rand(), Rand(seed) Exp(Col) Ln(Col) Log10(Col) Log2(Col) Log(base, Col) Pow(col, p), Power(col, p) Sqrt(Col) Bin(Col) Hex(Col) Unhex(Col) Conv(Col, from_base, to_base) Abs(Col) Pmod(arg1, arg2) Sin(Col) Asin(Col) Cos(Col) ACos(Col) Atan(Col) Degrees(Col) Radians(Col) Positive(Col) Negative(Col) Sign(Col) E() Pi() To reduce the total code volume, do an implicit type cast from non-double input types to double. Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so reuse code for those as appropriate. Try to call the function directly in the inner loop and avoid new() or expensive operations, as appropriate. Templatize the code where appropriate, e.g. all the unary function of form DOUBLE func(DOUBLE) can probably be done with a template. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4822) implement vectorized math functions
[ https://issues.apache.org/jira/browse/HIVE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717753#comment-13717753 ] Eric Hanson commented on HIVE-4822: --- I updated this patch with a small bug fix to call initBuffer() on string output vectors. I also updated the unit test accordingly. implement vectorized math functions --- Key: HIVE-4822 URL: https://issues.apache.org/jira/browse/HIVE-4822 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4822.1.patch, HIVE-4822.4.patch, HIVE-4822.5-vectorization.patch Implement vectorized support for the all the built-in math functions. This includes implementing the vectorized operation, and tying it all together in VectorizationContext so it runs end-to-end. These functions include: round(Col) Round(Col, N) Floor(Col) Ceil(Col) Rand(), Rand(seed) Exp(Col) Ln(Col) Log10(Col) Log2(Col) Log(base, Col) Pow(col, p), Power(col, p) Sqrt(Col) Bin(Col) Hex(Col) Unhex(Col) Conv(Col, from_base, to_base) Abs(Col) Pmod(arg1, arg2) Sin(Col) Asin(Col) Cos(Col) ACos(Col) Atan(Col) Degrees(Col) Radians(Col) Positive(Col) Negative(Col) Sign(Col) E() Pi() To reduce the total code volume, do an implicit type cast from non-double input types to double. Also, POSITITVE and NEGATIVE are syntactic sugar for unary + and unary -, so reuse code for those as appropriate. Try to call the function directly in the inner loop and avoid new() or expensive operations, as appropriate. Templatize the code where appropriate, e.g. all the unary function of form DOUBLE func(DOUBLE) can probably be done with a template. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4808) WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly
[ https://issues.apache.org/jira/browse/HIVE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4808: - Attachment: HIVE-4808.1.patch WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly --- Key: HIVE-4808 URL: https://issues.apache.org/jira/browse/HIVE-4808 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4808.1.patch, HIVE-4808.1.patch, HIVE-4808.patch (set mapred.task.timeout=7) curl -i -d user.name=ekoifman \ -d jar=/user/ekoifman/webhcate2e/hexamples.jar \ -d class=sleep \ -d arg=-mt \ -d arg=5 \ -d statusdir=/tmp \ 'http://localhost:50111/templeton/v1/mapreduce/jar' The TempletonControllerJob gets retried 4 times (Thus there are 4 SleepJob invocations) with message that it was killed due to inactivity. hexamples.jar = hadoop-examples-*.jar -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4808) WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly
[ https://issues.apache.org/jira/browse/HIVE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4808: - Attachment: (was: HIVE-4808.1.patch) WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly --- Key: HIVE-4808 URL: https://issues.apache.org/jira/browse/HIVE-4808 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4808.1.patch, HIVE-4808.patch (set mapred.task.timeout=7) curl -i -d user.name=ekoifman \ -d jar=/user/ekoifman/webhcate2e/hexamples.jar \ -d class=sleep \ -d arg=-mt \ -d arg=5 \ -d statusdir=/tmp \ 'http://localhost:50111/templeton/v1/mapreduce/jar' The TempletonControllerJob gets retried 4 times (Thus there are 4 SleepJob invocations) with message that it was killed due to inactivity. hexamples.jar = hadoop-examples-*.jar -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4222) Timestamp type constants cannot be deserialized in JDK 1.6 or less
[ https://issues.apache.org/jira/browse/HIVE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717782#comment-13717782 ] Hive QA commented on HIVE-4222: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12593736/HIVE-4222.D9681.3.patch {color:green}SUCCESS:{color} +1 2648 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/154/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/154/console Messages: {noformat} Executing org.apache.hive.ptest.execution.CleanupPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Timestamp type constants cannot be deserialized in JDK 1.6 or less -- Key: HIVE-4222 URL: https://issues.apache.org/jira/browse/HIVE-4222 Project: Hive Issue Type: Bug Components: Types Reporter: Navis Assignee: Navis Attachments: HIVE-4222.D9681.1.patch, HIVE-4222.D9681.2.patch, HIVE-4222.D9681.3.patch For example, {noformat} ExprNodeConstantDesc constant = new ExprNodeConstantDesc(TypeInfoFactory.timestampTypeInfo, new Timestamp(100)); String serialized = Utilities.serializeExpression(constant); ExprNodeConstantDesc deserilized = (ExprNodeConstantDesc) Utilities.deserializeExpression(serialized, new Configuration()); {noformat} logs error message {noformat} java.lang.InstantiationException: java.sql.Timestamp Continuing ... java.lang.RuntimeException: failed to evaluate: unbound=Class.new(); Continuing ... {noformat} and makes NPE in final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4808) WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly
[ https://issues.apache.org/jira/browse/HIVE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4808: - Status: Patch Available (was: Open) WebHCat job submission is killed by TaskTracker since it's not sending a heartbeat properly --- Key: HIVE-4808 URL: https://issues.apache.org/jira/browse/HIVE-4808 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4808.1.patch, HIVE-4808.patch (set mapred.task.timeout=7) curl -i -d user.name=ekoifman \ -d jar=/user/ekoifman/webhcate2e/hexamples.jar \ -d class=sleep \ -d arg=-mt \ -d arg=5 \ -d statusdir=/tmp \ 'http://localhost:50111/templeton/v1/mapreduce/jar' The TempletonControllerJob gets retried 4 times (Thus there are 4 SleepJob invocations) with message that it was killed due to inactivity. hexamples.jar = hadoop-examples-*.jar -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717813#comment-13717813 ] Edward Capriolo commented on HIVE-4825: --- Huge effort. I do see what your saying. The win is nice in that something is either MapWork or ReduceWork and there are some classes that do not need to redundantly set reduce tasks to 0 when they run on the map side. Even though this patch touches many files it pretty much touch them all in a small way that should not be too much trouble for anyone to deal with. I am +1, most of the changes this would cause would be cosmetic. Im only trying to look out for things that Navis and Yin have on the queue. Separate MapredWork into MapWork and ReduceWork --- Key: HIVE-4825 URL: https://issues.apache.org/jira/browse/HIVE-4825 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch Right now all the information needed to run an MR job is captured in MapredWork. This class has aliases, tagging info, table descriptors etc. For Tez and MRR it will be useful to break this into map and reduce specific pieces. The separation is natural and I think has value in itself, it makes the code easier to understand. However, it will also allow us to reuse these abstractions in Tez where you'll have a graph of these instead of just 1M and 0-1R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working
[ https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717828#comment-13717828 ] Phabricator commented on HIVE-3926: --- navis has commented on the revision HIVE-3926 [jira] PPD on virtual column of partitioned table is not working. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/Driver.java:600 It seemed I've changed this to avoid exceeding 100 character of line 620. I'll revert this. ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java:18 I should say that this patch was in patch-available state too long (I forgot what this was). But in the first look, this is much shorter than current and I like that. I'll add comment after reading this. ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java:108 There was this long long time ago. I'll remove that. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java:134 Ok, sure. REVISION DETAIL https://reviews.facebook.net/D8121 To: JIRA, navis Cc: hagleitn PPD on virtual column of partitioned table is not working - Key: HIVE-3926 URL: https://issues.apache.org/jira/browse/HIVE-3926 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3926.D8121.1.patch, HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch {code} select * from src where BLOCK__OFFSET__INSIDE__FILE100; {code} is working, but {code} select * from srcpart where BLOCK__OFFSET__INSIDE__FILE100; {code} throws SemanticException. Disabling PPD makes it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4642: - Attachment: HIVE-4642.3.patch.txt Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4642) Implement vectorized RLIKE and REGEXP filter expressions
[ https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717829#comment-13717829 ] Teddy Choi commented on HIVE-4642: -- Hello Eric. I uploaded a patch and I will upload its design specification on today night. It has more detailed comments and tests. It also applies your review. I'm sorry for being late. Implement vectorized RLIKE and REGEXP filter expressions Key: HIVE-4642 URL: https://issues.apache.org/jira/browse/HIVE-4642 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-4642-1.patch, HIVE-4642.2.patch, HIVE-4642.3.patch.txt See title. I will add more details next week. The goal is (a) make this work correctly and (b) optimize it as well as possible, at least for the common cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4660) Let there be Tez
[ https://issues.apache.org/jira/browse/HIVE-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717840#comment-13717840 ] Edward Capriolo commented on HIVE-4660: --- Thanks for uploading that. I am still getting up to speed a bit, silly question: I am looking through the tez source code and attempting to understand it's basic optimizations. I am looking at GroupByOrderByMRRTest. /** * Simple example that does a GROUP BY ORDER BY in an MRR job * Consider a query such as * Select DeptName, COUNT(*) as cnt FROM EmployeeTable * GROUP BY DeptName ORDER BY cnt; I notice that this test essentially runs the job single reducer. job.setNumReduceTasks(1); /** * Shuffle ensures ordering based on count of employees per department * hence the final reducer is a no-op and just emits the department name * with the employee count per department. */ What mechanism makes the above optimization happen? Do all shuffles have a natural total order sort with Tez? Let there be Tez Key: HIVE-4660 URL: https://issues.apache.org/jira/browse/HIVE-4660 Project: Hive Issue Type: New Feature Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Tez is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. Here's the project's page: http://incubator.apache.org/projects/tez.html The interesting thing about Tez from Hive's perspective is that it will over time allow us to overcome inefficiencies in query processing due to having to express every algorithm in the map-reduce paradigm. The barrier to entry is pretty low as well: Tez can actually run unmodified MR jobs; But as a first step we can without much trouble start using more of Tez' features by taking advantage of the MRR pattern. MRR simply means that there can be any number of reduce stages following a single map stage - without having to write intermediate results to HDFS and re-read them in a new job. This is common when queries require multiple shuffles on keys without correlation (e.g.: join - grp by - window function - order by) For more details see the design doc here: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira