[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization
[ https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119453#comment-14119453 ] Hive QA commented on HIVE-7943: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666082/HIVE-7943.1.patch {color:green}SUCCESS:{color} +1 6142 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/611/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/611/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-611/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12666082 hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization -- Key: HIVE-7943 URL: https://issues.apache.org/jira/browse/HIVE-7943 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.1 Reporter: Ashu Pachauri Attachments: HIVE-7943.1.patch HIVE-6250 separates owner privileges from user privileges. However, Default Authorization does not adapt to the change and table owners do not inherit permissions from the config. Steps to Reproduce: set hive.security.authorization.enabled=true; set hive.security.authorization.createtable.owner.grants=ALL; create table temp_table(id int, value string); drop table temp_table; Above set of operations throw the following error: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25245: Support dynamic service discovery for HiveServer2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/#review52116 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/25245/#comment90858 zk_ seems rendundant for location in zookeeper jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/25245/#comment90880 As this is a param in hiveserver2 jdbc URL, I think hive.server2. part is redundant. That part makes the url unncessarily verbose. I realize we have two params which have this prefix, but I think we should remove it from those as well (in another jira). jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/25245/#comment90881 I think we are likely to have people wanting to implement other modes of dynamically picking the HS2 host. For example, you could simply have multiple HS2 hostnames in a URL (instead of zookeeper hosts). Or people might decide to store the hostnames in another place instead of zookeeper. So I think instead of making this param a boolean, it is better to have the value as none (default) or zookeeper. Maybe change the param name also to service.discovery.mode ? jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/25245/#comment90859 comment moved to wrong line ? jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/25245/#comment90883 I think we can still make use of the java URI class for parameter parsing by just parsing the hostname portion first. Custom parsing of params in this mode can introduce bugs or inconsistencies. The JdbcConnectionParams can be expanded to give a list of hosts. The Utils.parseURL can first extract and substitute the multiple hostnames (if any), and then use the regular java URI parsing. We can have the to validate if the current discovery mode supports multiple hosts, after parsing. - Thejas Nair On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/ --- (Updated Sept. 2, 2014, 10:05 a.m.) Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair. Bugs: HIVE-7935 https://issues.apache.org/jira/browse/HIVE-7935 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-7935 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java 59294b1 service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 21c33bc service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java bc0a02c service/src/java/org/apache/hive/service/cli/session/SessionManager.java d573592 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 37b05fc service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 027931e service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java c380b69 service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java 66fc1fc Diff: https://reviews.apache.org/r/25245/diff/ Testing --- Manual testing + test cases. Thanks, Vaibhav Gumashta
Re: Review Request 25245: Support dynamic service discovery for HiveServer2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/#review52139 --- jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java https://reviews.apache.org/r/25245/#comment90887 How big are the zookeeper jars ? If we use zookeeper in this class, I believe zookeeper jars will always be needed for jdbc driver. It would be better to have the zookeeper service discovery code in a separate util class. That way we will need zookeeper jars only if this mode is used. service/src/java/org/apache/hive/service/server/HiveServer2.java https://reviews.apache.org/r/25245/#comment90889 It will be useful to log at info level that it is re-using the existing znode. service/src/java/org/apache/hive/service/server/HiveServer2.java https://reviews.apache.org/r/25245/#comment90890 It will be useful to have HS2 de-register itself if it gets a kill signal it can handle. That part can be done as part of follow-up jira as well. (Until then admin will need to manually edit the zookeeper entry). - Thejas Nair On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/ --- (Updated Sept. 2, 2014, 10:05 a.m.) Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair. Bugs: HIVE-7935 https://issues.apache.org/jira/browse/HIVE-7935 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-7935 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java 59294b1 service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 21c33bc service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java bc0a02c service/src/java/org/apache/hive/service/cli/session/SessionManager.java d573592 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 37b05fc service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 027931e service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java c380b69 service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java 66fc1fc Diff: https://reviews.apache.org/r/25245/diff/ Testing --- Manual testing + test cases. Thanks, Vaibhav Gumashta
[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-5760: --- Attachment: HIVE-5760.91.patch Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, HIVE-5760.91.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-5760: --- Status: Patch Available (was: In Progress) Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, HIVE-5760.91.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25288: HIVE-7941: add unit test case for line wrapping
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25288/#review52145 --- beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java https://reviews.apache.org/r/25288/#comment90895 Thanks for adding the test case for truncation. Can you also add one for the line wrapping? - Thejas Nair On Sept. 3, 2014, 5:48 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25288/ --- (Updated Sept. 3, 2014, 5:48 a.m.) Review request for hive. Repository: hive-git Description --- HIVE-7941: add unit test case for line wrapping Diffs - beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java PRE-CREATION Diff: https://reviews.apache.org/r/25288/diff/ Testing --- UT Thanks, cheng xu
Re: Review Request 25288: HIVE-7941: add unit test case for line wrapping
On Sept. 3, 2014, 7:30 a.m., Thejas Nair wrote: beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java, line 45 https://reviews.apache.org/r/25288/diff/1/?file=674876#file674876line45 Thanks for adding the test case for truncation. Can you also add one for the line wrapping? I mean, when a long input line is printed across multiple lines. - Thejas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25288/#review52145 --- On Sept. 3, 2014, 5:48 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25288/ --- (Updated Sept. 3, 2014, 5:48 a.m.) Review request for hive. Repository: hive-git Description --- HIVE-7941: add unit test case for line wrapping Diffs - beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java PRE-CREATION Diff: https://reviews.apache.org/r/25288/diff/ Testing --- UT Thanks, cheng xu
[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7946: --- Component/s: CBO CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5775) Introduce Cost Based Optimizer to Hive
[ https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-5775: --- Component/s: CBO Introduce Cost Based Optimizer to Hive -- Key: HIVE-5775 URL: https://issues.apache.org/jira/browse/HIVE-5775 Project: Hive Issue Type: New Feature Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: CBO-2.pdf, HIVE-5775.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119512#comment-14119512 ] Hive QA commented on HIVE-7223: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666093/HIVE-7223.4.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6145 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/612/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/612/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-612/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12666093 Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, HIVE-7223.4.patch Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which
[jira] [Updated] (HIVE-7324) CBO: provide a mechanism to test CBO features based on table stats only (w/o table data)
[ https://issues.apache.org/jira/browse/HIVE-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7324: --- Component/s: CBO CBO: provide a mechanism to test CBO features based on table stats only (w/o table data) Key: HIVE-7324 URL: https://issues.apache.org/jira/browse/HIVE-7324 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7324.1.patch, HIVE-7324.2.patch Since lot of the CBO work is focused on planning, it will be nice to be able to run explain query to test CBO features. TPCDS has a rich enough schema and query set. So the patch loads a dump TPCDS(Scale 1) stats. 1. TestCBO shows a way to load stats from a dump and run explain on a tpcds query. The output is currently dumped to Sys.out. This can be improved by hooking to QTestUtil, but hopefully this is a good start. 2. Uncovered couple of issues in the process of testing this: a) PartitionPruner fails on 'true' constants. For e.g. you will get an error for {code:sql} SELECT * FROM t WHERE partCol 100 AND true {code} This gets exposed because the predicates coming out of Optiq can contain 'true' predicates. b) OpTraitsRulesProcFactory:checkBucketedTable checks that number of files = numBuckets. This fails because there are no dataFiles. So I have altered it to catch exceptions and assume bucketMapJoinConvertible = false if an exception is encountered here. Uploading with these changes in this patch for now. Will carve them out as separate patches. [~ashutoshc], [~hagleitn] can you please take a look. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7280) CBO V1
[ https://issues.apache.org/jira/browse/HIVE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7280: --- Component/s: CBO CBO V1 -- Key: HIVE-7280 URL: https://issues.apache.org/jira/browse/HIVE-7280 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7280.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Attachment: HIVE-7889.4.patch Rebased Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7956) When inserting into a bucketed table, all data goes to a single bucket [Spark Branch]
Rui Li created HIVE-7956: Summary: When inserting into a bucketed table, all data goes to a single bucket [Spark Branch] Key: HIVE-7956 URL: https://issues.apache.org/jira/browse/HIVE-7956 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li I created a bucketed table: {code} create table testBucket(x int,y string) clustered by(x) into 10 buckets; {code} Then I run a query like: {code} set hive.enforce.bucketing = true; insert overwrite table testBucket select intCol,stringCol from src; {code} Here {{src}} is a simple textfile-based table containing 4000 records (not bucketed). The query launches 10 reduce tasks but all the data goes to only one of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119667#comment-14119667 ] Hive QA commented on HIVE-7944: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666101/HIVE-7944.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6142 tests executed *Failed tests:* {noformat} org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/613/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/613/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-613/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12666101 current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7794) Enable tests on Spark branch (4) [Sparch Branch]
[ https://issues.apache.org/jira/browse/HIVE-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7794: --- Status: Open (was: Patch Available) Enable tests on Spark branch (4) [Sparch Branch] Key: HIVE-7794 URL: https://issues.apache.org/jira/browse/HIVE-7794 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Chinna Rao Lalam Attachments: HIVE-7794-spark.patch This jira is to enable *most* of the tests below. If tests don't pass because of some unsupported feature, ensure that a JIRA exists and move on. {noformat} vector_cast_constant.q,\ vector_data_types.q,\ vector_decimal_aggregate.q,\ vector_left_outer_join.q,\ vector_string_concat.q,\ vectorization_12.q,\ vectorization_13.q,\ vectorization_14.q,\ vectorization_15.q,\ vectorization_9.q,\ vectorization_part_project.q,\ vectorization_short_regress.q,\ vectorized_mapjoin.q,\ vectorized_nested_mapjoin.q,\ vectorized_ptf.q,\ vectorized_shufflejoin.q,\ vectorized_timestamp_funcs.q {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119727#comment-14119727 ] Hive QA commented on HIVE-6847: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666115/HIVE-6847.9.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6134 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.TestJdbcWithMiniMr.org.apache.hive.jdbc.TestJdbcWithMiniMr org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.org.apache.hive.service.TestHS2ImpersonationWithRemoteMS {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/614/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/614/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-614/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12666115 Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, HIVE-6847.4.patch, HIVE-6847.5.patch, HIVE-6847.6.patch, HIVE-6847.7.patch, HIVE-6847.8.patch, HIVE-6847.9.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7826: - Attachment: HIVE-7826.7.patch .7 is rebased. Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14, tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate Add these properties : ||Property||Default Value|| |{{hive.tez.dynamic.partition.pruning}}|true| |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L| |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-7826. -- Resolution: Fixed Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14, tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate Add these properties : ||Property||Default Value|| |{{hive.tez.dynamic.partition.pruning}}|true| |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L| |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119743#comment-14119743 ] Gunther Hagleitner commented on HIVE-7826: -- Committed to branch. Thanks [~vikram.dixit]. [~damien.carol] thanks for trying it out. Let me know if you're still having problems with this. I'll address in follow up if need be. Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14, tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate Add these properties : ||Property||Default Value|| |{{hive.tez.dynamic.partition.pruning}}|true| |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L| |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7957) Revisit event version handling in dynamic partition pruning on Tez
Gunther Hagleitner created HIVE-7957: Summary: Revisit event version handling in dynamic partition pruning on Tez Key: HIVE-7957 URL: https://issues.apache.org/jira/browse/HIVE-7957 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Once TEZ-1447 is resolved, we should be able to simplify the handing of event versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7924) auto_sortmerge_join_8 sometimes fails with OOM
[ https://issues.apache.org/jira/browse/HIVE-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119750#comment-14119750 ] Gunther Hagleitner commented on HIVE-7924: -- LGTM +1. Won't hurt - did you see it fix the issue though? auto_sortmerge_join_8 sometimes fails with OOM -- Key: HIVE-7924 URL: https://issues.apache.org/jira/browse/HIVE-7924 Project: Hive Issue Type: Test Components: Tests Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7294.patch Saw in some runs of this test, the following in the [log|http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-572/failed/TestCliDriver-rcfile_merge1.q-fileformat_text.q-stats2.q-and-12-more/hive.log]: {noformat} (MapredLocalTask.java:executeInProcess(321)) - Hive Runtime Error: Map local work exhausted memory org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2014-08-29 08:31:56 Processing rows:4 Hashtable size: 3 Memory usage: 1531884480 percentage: 0.802 at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91) at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:251) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7949) Create table LIKE command doesn't set new owner
[ https://issues.apache.org/jira/browse/HIVE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119761#comment-14119761 ] Hive QA commented on HIVE-7949: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666116/HIVE-7949.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6142 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/615/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/615/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-615/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12666116 Create table LIKE command doesn't set new owner --- Key: HIVE-7949 URL: https://issues.apache.org/jira/browse/HIVE-7949 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0, 0.13.1 Reporter: Pala M Muthaia Assignee: Pala M Muthaia Fix For: 0.13.0, 0.13.1 Attachments: HIVE-7949.1.patch 'Create table like' command doesn't set the current user as owner of new table, instead new table owner is same as source table owner. This is a regression from 0.12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package
[ https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119865#comment-14119865 ] Hive QA commented on HIVE-7208: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666124/HIVE-7208.02.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6142 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_create org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_timestamp org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/617/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/617/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-617/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12666124 move SearchArgument interface into serde package Key: HIVE-7208 URL: https://issues.apache.org/jira/browse/HIVE-7208 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-7208.01.patch, HIVE-7208.02.patch, HIVE-7208.patch For usage in alternative input formats/serdes, it might be useful to move SearchArgument class to a place that is not in ql (because it's hard to depend on ql). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7826: --- Component/s: Tez Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Components: Tez Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14, tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate Add these properties : ||Property||Default Value|| |{{hive.tez.dynamic.partition.pruning}}|true| |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L| |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119846#comment-14119846 ] Damien Carol commented on HIVE-7826: I tested again with the last version of the tez branch. I can confirm that it works. Massive performance improvement with this patch. Many of our OLAP cubes are partitioned by year. We can now filter just 1 or 2 years which lowers the time of queries. Thanks a lot [~hagleitn] Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14, tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate Add these properties : ||Property||Default Value|| |{{hive.tez.dynamic.partition.pruning}}|true| |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L| |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7951) InputFormats implementing (Job)Configurable should not be cached
[ https://issues.apache.org/jira/browse/HIVE-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119815#comment-14119815 ] Hive QA commented on HIVE-7951: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666119/HIVE-7951.1.patch.txt {color:green}SUCCESS:{color} +1 6142 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/616/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/616/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-616/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12666119 InputFormats implementing (Job)Configurable should not be cached Key: HIVE-7951 URL: https://issues.apache.org/jira/browse/HIVE-7951 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-7951.1.patch.txt Currently, initial configuration instance is shared to all following input formats, which should not be like that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119917#comment-14119917 ] Hive QA commented on HIVE-5760: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666176/HIVE-5760.91.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/619/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/619/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-619/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-619/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1622274. At revision 1622274. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12666176 Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, HIVE-5760.91.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7571) RecordUpdater should read virtual columns from row
[ https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119923#comment-14119923 ] Owen O'Malley commented on HIVE-7571: - +1 LGTM You might want to replace s/recIdCol/RecordIdColumn/g to be more readable, since it is a public API. RecordUpdater should read virtual columns from row -- Key: HIVE-7571 URL: https://issues.apache.org/jira/browse/HIVE-7571 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7571.2.patch, HIVE-7571.WIP.patch, HIVE-7571.patch Currently RecordUpdater.update and delete take rowid and original transaction as parameters. These values are already present in the row as part of the new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer from there. And the writer will already have to handle skipping ROW__ID when writing, so it needs to be aware of that column anyone. We could instead read the values from ROW__ID and then remove it from the object inspector in FileSinkOperator, but this will be hard in the vectorization case where rows are being dealt with 10k at a time. For these reasons it makes more sense to do this work in the writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25245: Support dynamic service discovery for HiveServer2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/#review52171 --- service/src/java/org/apache/hive/service/server/HiveServer2.java https://reviews.apache.org/r/25245/#comment90921 It seems like we want more than warn here if we fail to create the parent node. In this case we'll be unable to create the node for this instance, and clients will be unable to find the server. I would think this should be fatal. service/src/java/org/apache/hive/service/server/HiveServer2.java https://reviews.apache.org/r/25245/#comment90922 Agree we should have a clean shutdown case. The timeout was 3 minutes I think, which means it will be a while after the system shuts down that clients keep trying to contact it. - Alan Gates On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25245/ --- (Updated Sept. 2, 2014, 10:05 a.m.) Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair. Bugs: HIVE-7935 https://issues.apache.org/jira/browse/HIVE-7935 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-7935 Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java 59294b1 service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 21c33bc service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java bc0a02c service/src/java/org/apache/hive/service/cli/session/SessionManager.java d573592 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 37b05fc service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 027931e service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java c380b69 service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java 66fc1fc Diff: https://reviews.apache.org/r/25245/diff/ Testing --- Manual testing + test cases. Thanks, Vaibhav Gumashta
[jira] [Commented] (HIVE-7941) add test case for beeline line wrapping
[ https://issues.apache.org/jira/browse/HIVE-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119916#comment-14119916 ] Hive QA commented on HIVE-7941: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666158/HIVE-7941.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6143 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/618/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/618/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-618/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12666158 add test case for beeline line wrapping --- Key: HIVE-7941 URL: https://issues.apache.org/jira/browse/HIVE-7941 Project: Hive Issue Type: Bug Components: Clients, JDBC Affects Versions: 0.14.0 Reporter: Thejas M Nair Assignee: Ferdinand Xu Attachments: HIVE-7941.patch The patch HIVE-6928 does not add tests that actually verify that line wrapping takes place. It will be good to have a test for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7890) SessionState creates HMS Client while not impersonating
[ https://issues.apache.org/jira/browse/HIVE-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119997#comment-14119997 ] Brock Noland commented on HIVE-7890: Thank you Dong and Prasad! I have committed this to trunk. SessionState creates HMS Client while not impersonating --- Key: HIVE-7890 URL: https://issues.apache.org/jira/browse/HIVE-7890 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-7890.2.patch In SessionState.start [an instance of the the HMSClient is created|https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L367]. When impersonation is enabled, this call does not occur within a doas call and thus the HMSClient is created as the server user, not the impersonated user. Thus calls to the HMS are made by the hive user as opposed to the end user. This causes file ownership such as a database directory owner to be incorrect. While debugging this, I got stack trace below. As you can see we are calling getMSC without a doas. {noformat} at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2474) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:367) at org.apache.hive.service.cli.session.HiveSessionImpl.init(HiveSessionImpl.java:121) at org.apache.hive.service.cli.session.HiveSessionImplwithUGI.init(HiveSessionImplwithUGI.java:49) at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:130) at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:163) at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:290) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:208) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7890) SessionState creates HMS Client while not impersonating
[ https://issues.apache.org/jira/browse/HIVE-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7890: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) SessionState creates HMS Client while not impersonating --- Key: HIVE-7890 URL: https://issues.apache.org/jira/browse/HIVE-7890 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-7890.2.patch In SessionState.start [an instance of the the HMSClient is created|https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L367]. When impersonation is enabled, this call does not occur within a doas call and thus the HMSClient is created as the server user, not the impersonated user. Thus calls to the HMS are made by the hive user as opposed to the end user. This causes file ownership such as a database directory owner to be incorrect. While debugging this, I got stack trace below. As you can see we are calling getMSC without a doas. {noformat} at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2474) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:367) at org.apache.hive.service.cli.session.HiveSessionImpl.init(HiveSessionImpl.java:121) at org.apache.hive.service.cli.session.HiveSessionImplwithUGI.init(HiveSessionImplwithUGI.java:49) at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:130) at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:163) at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:290) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:208) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7958) SparkWork generated by SparkCompiler may require multiple Spark jobs to run
Xuefu Zhang created HIVE-7958: - Summary: SparkWork generated by SparkCompiler may require multiple Spark jobs to run Key: HIVE-7958 URL: https://issues.apache.org/jira/browse/HIVE-7958 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Priority: Critical A SparkWork instance currently may contain disjointed work graphs. For instance, union_remove_1.q may generated a plan like this: {code} Reduce2 - Map 1 Reduce4 - Map 3 {code} The SparkPlan instance generated from this work graph contains two result RDDs. When such plan is executed, we call .foreach() on the two RDDs sequentially, which results two Spark jobs, one after the other. While this works functionally, the performance will not be great as the Spark jobs are run sequentially rather than concurrently. Another side effect of this is that the corresponding SparkPlan instance is over-complicated. The are two potential approaches: 1. Let SparkCompiler generate a work that can be executed in ONE Spark job only. In above example, two Spark task should be generated. 2. Let SparkPlanGenerate generate multiple Spark plans and then SparkClient executes them concurrently. Approach #1 seems more reasonable and naturally fit to our architecture. Also, Hive's task execution framework already takes care of the task concurrency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required
[ https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7682: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you Prasad for the review! I have committed this to trunk! HadoopThriftAuthBridge20S should not reset configuration unless required Key: HIVE-7682 URL: https://issues.apache.org/jira/browse/HIVE-7682 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-7682.1.patch, HIVE-7682.2.patch, HIVE-7682.3.patch In HadoopThriftAuthBridge20S methods createClientWithConf and getCurrentUGIWithConf we create new Configuration objects so we can set the authentication type. When loading the new Configuration object, it looks like core-site.xml for the cluster it's connected to. This causes issues for Oozie since oozie does not have access to the core-site.xml as it's cluster agnostic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7923) populate stats for test tables
[ https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7923: --- Status: Open (was: Patch Available) Need to update .q.out files populate stats for test tables -- Key: HIVE-7923 URL: https://issues.apache.org/jira/browse/HIVE-7923 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7923.1.patch, HIVE-7923.2.patch Current q_test only generates tables, e.g., src only but does not create status. All the test cases will fail in CBO because CBO depends on the status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120011#comment-14120011 ] Ashutosh Chauhan commented on HIVE-7944: [~pxiong] Have you tested this with mysql which has tables pre-created (not auto created via mysql)? I think there might be issues because in those cases csid partid being null wont be inserted in db. I think let just simply revert HIVE-7876 before we figure out proper fix for this. current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120011#comment-14120011 ] Ashutosh Chauhan edited comment on HIVE-7944 at 9/3/14 4:04 PM: [~pxiong] Have you tested this with mysql which has tables pre-created (not auto created via datanucleus)? I think there might be issues because in those cases csid partid being null will prevent data from getting inserted in db. I think let just simply revert HIVE-7876 before we figure out proper fix for this. was (Author: ashutoshc): [~pxiong] Have you tested this with mysql which has tables pre-created (not auto created via mysql)? I think there might be issues because in those cases csid partid being null wont be inserted in db. I think let just simply revert HIVE-7876 before we figure out proper fix for this. current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120012#comment-14120012 ] Hive QA commented on HIVE-7689: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666180/HIVE-7889.4.patch {color:green}SUCCESS:{color} +1 6142 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/620/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/620/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-620/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12666180 Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7959) TestHadoop20SAuthBridge fails to compile under hadoop 2.5
Brock Noland created HIVE-7959: -- Summary: TestHadoop20SAuthBridge fails to compile under hadoop 2.5 Key: HIVE-7959 URL: https://issues.apache.org/jira/browse/HIVE-7959 Project: Hive Issue Type: Bug Reporter: Brock Noland I tested Hadoop under 2.5 and it fails to compile due to use of private apis which changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7959) TestHadoop20SAuthBridge fails to compile under hadoop 2.5
[ https://issues.apache.org/jira/browse/HIVE-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7959: --- Attachment: HIVE-7959.1.patch TestHadoop20SAuthBridge fails to compile under hadoop 2.5 - Key: HIVE-7959 URL: https://issues.apache.org/jira/browse/HIVE-7959 Project: Hive Issue Type: Bug Reporter: Brock Noland Attachments: HIVE-7959.1.patch I tested Hadoop under 2.5 and it fails to compile due to use of private apis which changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7960) Upgrade to Hadoop 2.5
Brock Noland created HIVE-7960: -- Summary: Upgrade to Hadoop 2.5 Key: HIVE-7960 URL: https://issues.apache.org/jira/browse/HIVE-7960 Project: Hive Issue Type: Task Reporter: Brock Noland Tracking JIRA for upgrading to 2.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7959) TestHadoop20SAuthBridge fails to compile under hadoop 2.5
[ https://issues.apache.org/jira/browse/HIVE-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved HIVE-7959. Resolution: Duplicate TestHadoop20SAuthBridge fails to compile under hadoop 2.5 - Key: HIVE-7959 URL: https://issues.apache.org/jira/browse/HIVE-7959 Project: Hive Issue Type: Bug Reporter: Brock Noland Attachments: HIVE-7959.1.patch I tested Hadoop under 2.5 and it fails to compile due to use of private apis which changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7184) TestHadoop20SAuthBridge no longer compiles after HADOOP-10448
[ https://issues.apache.org/jira/browse/HIVE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7184: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7960 TestHadoop20SAuthBridge no longer compiles after HADOOP-10448 - Key: HIVE-7184 URL: https://issues.apache.org/jira/browse/HIVE-7184 Project: Hive Issue Type: Sub-task Components: Tests Affects Versions: 0.14.0 Reporter: Jason Dere Attachments: HIVE-7184.1.patch, HIVE-7184.2.patch HADOOP-10448 moves a couple of methods which were being used by the TestHadoop20SAuthBridge test. If/when Hive build uses Hadoop 2.5 as a dependency, this will cause compilation errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7184) TestHadoop20SAuthBridge no longer compiles after HADOOP-10448
[ https://issues.apache.org/jira/browse/HIVE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120082#comment-14120082 ] Brock Noland commented on HIVE-7184: Made subtask of HIVE-7960. TestHadoop20SAuthBridge no longer compiles after HADOOP-10448 - Key: HIVE-7184 URL: https://issues.apache.org/jira/browse/HIVE-7184 Project: Hive Issue Type: Sub-task Components: Tests Affects Versions: 0.14.0 Reporter: Jason Dere Attachments: HIVE-7184.1.patch, HIVE-7184.2.patch HADOOP-10448 moves a couple of methods which were being used by the TestHadoop20SAuthBridge test. If/when Hive build uses Hadoop 2.5 as a dependency, this will cause compilation errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7553) avoid the scheduling maintenance window for every jar change
[ https://issues.apache.org/jira/browse/HIVE-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120093#comment-14120093 ] Brock Noland commented on HIVE-7553: [~Ferd] the jar cannot be included in the patch itself. Can you upload them to the JIRA for manual testing? Thanks!! avoid the scheduling maintenance window for every jar change Key: HIVE-7553 URL: https://issues.apache.org/jira/browse/HIVE-7553 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-7553.1.patch, HIVE-7553.2.patch, HIVE-7553.3.patch, HIVE-7553.patch, HIVE-7553.pdf When user needs to refresh existing or add a new jar to HS2, it needs to restart it. As HS2 is service exposed to clients, this requires scheduling maintenance window for every jar change. It would be great if we could avoid that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7948) Add an E2E test to verify fix for HIVE-7155
[ https://issues.apache.org/jira/browse/HIVE-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120098#comment-14120098 ] Eugene Koifman commented on HIVE-7948: -- It seems like this test requires manual setup steps. This complicates running the tests and in practice usually means that they are not being run. If you look at hcatalog/src/tests/e2e/templeton/deployers there is a set of scripts that help automate the sat up. It should be easy to modify deploy_e2e_artifacts to makes sure the newly required data files is copied to hdfs. config/ has some precanned config files - this logic may need to be improved a bit to be able to deploy a -site.xml file specific to a given group of tests. Add an E2E test to verify fix for HIVE-7155 Key: HIVE-7948 URL: https://issues.apache.org/jira/browse/HIVE-7948 Project: Hive Issue Type: Test Components: Tests, WebHCat Reporter: Aswathy Chellammal Sreekumar Assignee: Aswathy Chellammal Sreekumar Priority: Minor Attachments: HIVE-7948.patch E2E Test to verify webhcat property templeton.mapper.memory.mb correctly overrides mapreduce.map.memory.mb. The feature was added as part of HIVE-7155. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7961) metastore schema improvement for adding partition to Hive table
Chu Tong created HIVE-7961: -- Summary: metastore schema improvement for adding partition to Hive table Key: HIVE-7961 URL: https://issues.apache.org/jira/browse/HIVE-7961 Project: Hive Issue Type: Bug Components: Metastore Reporter: Chu Tong Priority: Minor One of the performance bottlenecks for adding a partition in Hive table and the query takes most of the time in this process is: SELECT A0.PART_NAME FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE B0.TBL_NAME = @P0 AND C0.NAME = @P1 AND A0.PART_NAME = @P2 This query joins partition table with table table and database table in Hive metastore and it becomes slow when these tables are big. A viable way to optimize this is the de-normalize the partition table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120161#comment-14120161 ] Eugene Koifman commented on HIVE-7944: -- +1 for the revert idea current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7949) Create table LIKE command doesn't set new owner
[ https://issues.apache.org/jira/browse/HIVE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120237#comment-14120237 ] Pala M Muthaia commented on HIVE-7949: -- [~ashutoshc] [~navis], can either of you review or add appropriate reviewers for this patch? Also i ran the 2 tests that failed above locally and they passed. Thanks. Create table LIKE command doesn't set new owner --- Key: HIVE-7949 URL: https://issues.apache.org/jira/browse/HIVE-7949 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0, 0.13.1 Reporter: Pala M Muthaia Assignee: Pala M Muthaia Fix For: 0.13.0, 0.13.1 Attachments: HIVE-7949.1.patch 'Create table like' command doesn't set the current user as owner of new table, instead new table owner is same as source table owner. This is a regression from 0.12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7944: -- Status: Open (was: Patch Available) current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7944: -- Attachment: HIVE-7944.2.patch reverse the patch current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengcheng xiong updated HIVE-7944: -- Status: Patch Available (was: Open) reverse the patch current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25281: speed up the write path of col stats of partitions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25281/ --- (Updated Sept. 3, 2014, 6:50 p.m.) Review request for hive. Repository: hive-git Description (updated) --- reverse the patch Diffs (updated) - metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 68b5563 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java a16d1c2 metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 0fdafa2 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 6b5e79d metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 981c5ff Diff: https://reviews.apache.org/r/25281/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Commented] (HIVE-7949) Create table LIKE command doesn't set new owner
[ https://issues.apache.org/jira/browse/HIVE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120252#comment-14120252 ] Ashutosh Chauhan commented on HIVE-7949: LGTM +1 cc: [~thejas] Create table LIKE command doesn't set new owner --- Key: HIVE-7949 URL: https://issues.apache.org/jira/browse/HIVE-7949 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0, 0.13.1 Reporter: Pala M Muthaia Assignee: Pala M Muthaia Fix For: 0.13.0, 0.13.1 Attachments: HIVE-7949.1.patch 'Create table like' command doesn't set the current user as owner of new table, instead new table owner is same as source table owner. This is a regression from 0.12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120254#comment-14120254 ] pengcheng xiong commented on HIVE-7944: --- After I tested this with mysql which has tables pre-created (not auto created via datanucleus), I find that my method does not work. Thus, I followed the revert idea https://reviews.apache.org/r/25281/ current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120263#comment-14120263 ] Ashutosh Chauhan commented on HIVE-7944: +1 current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization
[ https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120264#comment-14120264 ] Ashu Pachauri commented on HIVE-7943: - [~thejas] Can you have a look at this? hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization -- Key: HIVE-7943 URL: https://issues.apache.org/jira/browse/HIVE-7943 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.1 Reporter: Ashu Pachauri Attachments: HIVE-7943.1.patch HIVE-6250 separates owner privileges from user privileges. However, Default Authorization does not adapt to the change and table owners do not inherit permissions from the config. Steps to Reproduce: set hive.security.authorization.enabled=true; set hive.security.authorization.createtable.owner.grants=ALL; create table temp_table(id int, value string); drop table temp_table; Above set of operations throw the following error: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7223: --- Attachment: HIVE-7223.5.patch I've taken your advice a step further, and removed redundancy in {{HiveMetaStore.initializeAddedPartition()}}, {{MetaStoreUtils.updatePartitionStatsFast()}}, and {{Warehouse.getFileStatusesForSD()}}. Cleaner, all around. Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, HIVE-7223.4.patch, HIVE-7223.5.patch Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7223: --- Status: Patch Available (was: Open) Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.13.0, 0.12.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, HIVE-7223.4.patch, HIVE-7223.5.patch Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7223: --- Status: Open (was: Patch Available) Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.13.0, 0.12.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, HIVE-7223.4.patch, HIVE-7223.5.patch Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7949) Create table LIKE command doesn't set new owner
[ https://issues.apache.org/jira/browse/HIVE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120267#comment-14120267 ] Thejas M Nair commented on HIVE-7949: - +1 Create table LIKE command doesn't set new owner --- Key: HIVE-7949 URL: https://issues.apache.org/jira/browse/HIVE-7949 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0, 0.13.1 Reporter: Pala M Muthaia Assignee: Pala M Muthaia Fix For: 0.13.0, 0.13.1 Attachments: HIVE-7949.1.patch 'Create table like' command doesn't set the current user as owner of new table, instead new table owner is same as source table owner. This is a regression from 0.12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-5760: --- Status: In Progress (was: Patch Available) Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, HIVE-5760.91.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-5760: --- Attachment: HIVE-5760.92.patch Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, HIVE-5760.91.patch, HIVE-5760.92.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-5760: --- Status: Patch Available (was: In Progress) Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, HIVE-5760.91.patch, HIVE-5760.92.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization
[ https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120306#comment-14120306 ] Thejas M Nair commented on HIVE-7943: - This patch does not add the owner grants into table metadata. That is the purpose of this configuration flag. Instead it is adding the privileges at runtime during the checks. Looking at the current code again, I don't see a bug there wrt to the privileges getting set at table creation. I wonder if the problem is that ALL privileges are not getting correctly interpreted as including the the Drop privilege. In the example that you have in description. Can you paste the output of 'show grant on table temp_table' ? hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization -- Key: HIVE-7943 URL: https://issues.apache.org/jira/browse/HIVE-7943 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.1 Reporter: Ashu Pachauri Attachments: HIVE-7943.1.patch HIVE-6250 separates owner privileges from user privileges. However, Default Authorization does not adapt to the change and table owners do not inherit permissions from the config. Steps to Reproduce: set hive.security.authorization.enabled=true; set hive.security.authorization.createtable.owner.grants=ALL; create table temp_table(id int, value string); drop table temp_table; Above set of operations throw the following error: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3865) Allow collect_set to work on non-primitive types
[ https://issues.apache.org/jira/browse/HIVE-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120362#comment-14120362 ] karthik commented on HIVE-3865: --- Did you build a UDF for collect_set() of structs.? How did you sort out the issue.? Allow collect_set to work on non-primitive types Key: HIVE-3865 URL: https://issues.apache.org/jira/browse/HIVE-3865 Project: Hive Issue Type: Improvement Reporter: Ron Bodkin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7946: - Issue Type: Bug (was: Sub-task) Parent: (was: HIVE-5775) CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Bug Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7962) Prevent Alter Table, drop,show Code paths from exercising CBO
Laljo John Pullokkaran created HIVE-7962: Summary: Prevent Alter Table, drop,show Code paths from exercising CBO Key: HIVE-7962 URL: https://issues.apache.org/jira/browse/HIVE-7962 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7963) Handle UDFS : Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, format_number, instr
Laljo John Pullokkaran created HIVE-7963: Summary: Handle UDFS : Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, format_number, instr Key: HIVE-7963 URL: https://issues.apache.org/jira/browse/HIVE-7963 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7965) Handle Row Schema
Laljo John Pullokkaran created HIVE-7965: Summary: Handle Row Schema Key: HIVE-7965 URL: https://issues.apache.org/jira/browse/HIVE-7965 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7964) Handle explode, lateral views
Laljo John Pullokkaran created HIVE-7964: Summary: Handle explode, lateral views Key: HIVE-7964 URL: https://issues.apache.org/jira/browse/HIVE-7964 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7966) CBO Trunk Merge: Hive Unit test Subquery test failure
Laljo John Pullokkaran created HIVE-7966: Summary: CBO Trunk Merge: Hive Unit test Subquery test failure Key: HIVE-7966 URL: https://issues.apache.org/jira/browse/HIVE-7966 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120383#comment-14120383 ] Hive QA commented on HIVE-7944: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12666289/HIVE-7944.2.patch {color:green}SUCCESS:{color} +1 6142 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/621/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/621/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-621/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12666289 current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7963) CBO Trunk Merge:Handle UDFS : Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, format_number, instr
[ https://issues.apache.org/jira/browse/HIVE-7963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7963: - Summary: CBO Trunk Merge:Handle UDFS : Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, format_number, instr (was: Handle UDFS : Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, format_number, instr ) CBO Trunk Merge:Handle UDFS : Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, format_number, instr Key: HIVE-7963 URL: https://issues.apache.org/jira/browse/HIVE-7963 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-7966) CBO Trunk Merge: Hive Unit test Subquery test failure
[ https://issues.apache.org/jira/browse/HIVE-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran reassigned HIVE-7966: Assignee: Harish Butani CBO Trunk Merge: Hive Unit test Subquery test failure - Key: HIVE-7966 URL: https://issues.apache.org/jira/browse/HIVE-7966 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Harish Butani -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7962) CBO Trunk Merge:Prevent Alter Table, drop,show Code paths from exercising CBO
[ https://issues.apache.org/jira/browse/HIVE-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7962: - Summary: CBO Trunk Merge:Prevent Alter Table, drop,show Code paths from exercising CBO (was: Prevent Alter Table, drop,show Code paths from exercising CBO) CBO Trunk Merge:Prevent Alter Table, drop,show Code paths from exercising CBO - Key: HIVE-7962 URL: https://issues.apache.org/jira/browse/HIVE-7962 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7964) CBO Trunk Merge:Handle explode, lateral views
[ https://issues.apache.org/jira/browse/HIVE-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7964: - Summary: CBO Trunk Merge:Handle explode, lateral views (was: Handle explode, lateral views) CBO Trunk Merge:Handle explode, lateral views - Key: HIVE-7964 URL: https://issues.apache.org/jira/browse/HIVE-7964 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization
[ https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120395#comment-14120395 ] Ashu Pachauri commented on HIVE-7943: - Is that the purpose of the configuration flag? I thought the reason for separating owner grants from user grants was that the owner grants are dynamically applied at the time of authorization to the current owner (if there would be a way to change the owner). If they are persisted in metadata, the grants need to be changed when the owner changes or when the configuration property changes. (E.g. From ALL to SELECT, DROP etc.) show grant on temp_table gives me empty results unless I explicitly do a 'grant all on temp_table to user testuser' . The problem is not observed only with ALL privileges. Same problem is encountered when I change the configuration property to DROP instead of ALL. hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization -- Key: HIVE-7943 URL: https://issues.apache.org/jira/browse/HIVE-7943 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.1 Reporter: Ashu Pachauri Attachments: HIVE-7943.1.patch HIVE-6250 separates owner privileges from user privileges. However, Default Authorization does not adapt to the change and table owners do not inherit permissions from the config. Steps to Reproduce: set hive.security.authorization.enabled=true; set hive.security.authorization.createtable.owner.grants=ALL; create table temp_table(id int, value string); drop table temp_table; Above set of operations throw the following error: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7966) CBO Trunk Merge: Hive Unit test Subquery test failure
[ https://issues.apache.org/jira/browse/HIVE-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120400#comment-14120400 ] Laljo John Pullokkaran commented on HIVE-7966: -- 1. select * from (select * from src b where exists (select a.key from src a where b.value = a.value and a.key = b.key and a.value 'val_9') ) a 2. select b.key, min(b.value) from src b group by b.key having exists ( select a.key from src a where a.value 'val_9' and a.value = min(b.value) ) 3. select p.p_partkey, li.l_suppkey from (select distinct l_partkey as p_partkey from lineitem) p join lineitem li on p.p_partkey = li.l_partkey where li.l_linenumber = 1 and li.l_orderkey in (select l_orderkey from lineitem where l_shipmode = 'AIR' and l_linenumber = li.l_linenumber) 4. explain select p_mfgr, p_name, avg(p_size) from part group by p_mfgr, p_name having p_name in (select first_value(p_name) over(partition by p_mfgr order by p_size) from part) 5. select * from src b where not exists (select a.key from src a where b.value = a.value and a.value 'val_2' ) 6. select * from src b group by key, value having not exists (select distinct a.key from src a where b.value = a.value and a.value 'val_12' ) 7. select * from T1_v where T1_v.key not in (select T2_v.key from T2_v) 8. select b.p_mfgr, min(p_retailprice) from part b group by b.p_mfgr having b.p_mfgr not in (select p_mfgr from part a group by p_mfgr having max(p_retailprice) - min(p_retailprice) 600 ) 9. explain select p_mfgr, b.p_name, p_size from part b where b.p_name not in (select p_name from (select p_mfgr, p_name, p_size, rank() over(partition by p_mfgr order by p_size) as r from part) a where r = 2 and b.p_mfgr = p_mfgr ) 10. select * from cv3 where cv3.key in (select key from cv1) CBO Trunk Merge: Hive Unit test Subquery test failure - Key: HIVE-7966 URL: https://issues.apache.org/jira/browse/HIVE-7966 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Harish Butani -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7967) CBO Trunk Merge: Fall Back in case of complex types
Laljo John Pullokkaran created HIVE-7967: Summary: CBO Trunk Merge: Fall Back in case of complex types Key: HIVE-7967 URL: https://issues.apache.org/jira/browse/HIVE-7967 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Ashutosh Chauhan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7944: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Pengcheng! current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Reporter: pengcheng xiong Assignee: pengcheng xiong Fix For: 0.14.0 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7944: --- Affects Version/s: 0.14.0 current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.14.0 Reporter: pengcheng xiong Assignee: pengcheng xiong Fix For: 0.14.0 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct
[ https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7944: --- Component/s: Statistics current update stats for columns of a partition of a table is not correct - Key: HIVE-7944 URL: https://issues.apache.org/jira/browse/HIVE-7944 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.14.0 Reporter: pengcheng xiong Assignee: pengcheng xiong Fix For: 0.14.0 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch We worked hard towards faster update stats for columns of a partition of a table previously https://issues.apache.org/jira/browse/HIVE-7736 and https://issues.apache.org/jira/browse/HIVE-7876 Although there is some improvement, it is only correct in the first run. There will be duplicate column stats later. Thanks to [~ekoifman] 's comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7580) Support dynamic partitioning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120438#comment-14120438 ] Chinna Rao Lalam commented on HIVE-7580: Verified the below tests and all the tests are passed except load_dyn_part1.q, load_dyn_part8.q {noformat} load_dyn_part1.q, load_dyn_part2.q, load_dyn_part3.q, load_dyn_part4.q, load_dyn_part5.q, load_dyn_part6.q, load_dyn_part7.q, load_dyn_part8.q, load_dyn_part9, load_dyn_part10.q, load_dyn_part11.q, load_dyn_part12.q, load_dyn_part13.q, load_dyn_part.14, load_dyn_part15.q {noformat} To enable the tests for dynamic partitions considered below tests(referred from tez) {noformat} load_dyn_part1.q, load_dyn_part2.q, load_dyn_part3.q, dynpart_sort_optimization.q, dynpart_sort_opt_vectorization.q {noformat} Here the below tests are failing {noformat} load_dyn_part1.q, oad_dyn_part8.q, dynpart_sort_optimization.q, dynpart_sort_opt_vectorization.q {noformat} For these 4 test cases we have issues. I will add these tests in those jira's and i will work. {quote} load_dyn_part1.q,load_dyn_part8.q both the tests contains multi-inserts. Need to test after HIVE-7503 is fixed. {quote} {quote} dynpart_sort_opt_vectorization.q is related to vectorization. Need to test after HIVE-7794 is fixed. {quote} {quote} dynpart_sort_optimization.q is hitting same exception as HIVE-7843. {quote} Support dynamic partitioning [Spark Branch] --- Key: HIVE-7580 URL: https://issues.apache.org/jira/browse/HIVE-7580 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam Labels: Spark-M1 My understanding is that we don't need to do anything special for this. However, this needs to be verified and tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization
[ https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120440#comment-14120440 ] Thejas M Nair commented on HIVE-7943: - The description of the configuration also mentions the purpose - the privileges automatically granted to the owner whenever a table gets created. This is also the case with use grants configuration. The purpose hasn't been changed intentionally. The reason for separating user grants and owner grants was so that the owner user is set correctly, when the owner is changed within a session (for ease of testing). hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization -- Key: HIVE-7943 URL: https://issues.apache.org/jira/browse/HIVE-7943 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.1 Reporter: Ashu Pachauri Attachments: HIVE-7943.1.patch HIVE-6250 separates owner privileges from user privileges. However, Default Authorization does not adapt to the change and table owners do not inherit permissions from the config. Steps to Reproduce: set hive.security.authorization.enabled=true; set hive.security.authorization.createtable.owner.grants=ALL; create table temp_table(id int, value string); drop table temp_table; Above set of operations throw the following error: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization
[ https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120444#comment-14120444 ] Thejas M Nair commented on HIVE-7943: - You can try tracing through the calls made from Hive.createTable to CreateTableAutomaticGrant.getUserGrants where it adds the grants to table object. hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization -- Key: HIVE-7943 URL: https://issues.apache.org/jira/browse/HIVE-7943 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.1 Reporter: Ashu Pachauri Attachments: HIVE-7943.1.patch HIVE-6250 separates owner privileges from user privileges. However, Default Authorization does not adapt to the change and table owners do not inherit permissions from the config. Steps to Reproduce: set hive.security.authorization.enabled=true; set hive.security.authorization.createtable.owner.grants=ALL; create table temp_table(id int, value string); drop table temp_table; Above set of operations throw the following error: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' found for outputs { database:default, table:temp_table}. Use SHOW GRANT to get more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7508) Kerberos support for streaming
[ https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120448#comment-14120448 ] Roshan Naik commented on HIVE-7508: --- [~leftylev]. Yes Thanks for bringing it up. I will work with [~alangates] on updating that. Kerberos support for streaming -- Key: HIVE-7508 URL: https://issues.apache.org/jira/browse/HIVE-7508 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming, TODOC14 Fix For: 0.14.0 Attachments: HIVE-7508.patch Add kerberos support for streaming to secure Hive cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7968) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2
[ https://issues.apache.org/jira/browse/HIVE-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7968: --- Fix Version/s: 0.14.0 Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2 -- Key: HIVE-7968 URL: https://issues.apache.org/jira/browse/HIVE-7968 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7968) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2
Vaibhav Gumashta created HIVE-7968: -- Summary: Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2 Key: HIVE-7968 URL: https://issues.apache.org/jira/browse/HIVE-7968 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7580) Support dynamic partitioning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7580: --- Attachment: HIVE-7580.patch Patch contains passed test cases. Support dynamic partitioning [Spark Branch] --- Key: HIVE-7580 URL: https://issues.apache.org/jira/browse/HIVE-7580 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam Labels: Spark-M1 Attachments: HIVE-7580.patch My understanding is that we don't need to do anything special for this. However, this needs to be verified and tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7968) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2
[ https://issues.apache.org/jira/browse/HIVE-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7968: --- Affects Version/s: 0.14.0 Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2 -- Key: HIVE-7968 URL: https://issues.apache.org/jira/browse/HIVE-7968 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7968) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2
[ https://issues.apache.org/jira/browse/HIVE-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7968: --- Description: MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup when JDBC is the client api for HS2. (was: MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup.) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2 -- Key: HIVE-7968 URL: https://issues.apache.org/jira/browse/HIVE-7968 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup when JDBC is the client api for HS2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7811) Compactions need to update table/partition stats
[ https://issues.apache.org/jira/browse/HIVE-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7811: - Status: Patch Available (was: Open) Compactions need to update table/partition stats Key: HIVE-7811 URL: https://issues.apache.org/jira/browse/HIVE-7811 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-7811.3.patch, HIVE-7811.4.patch, HIVE-7811.5.patch, HIVE-7811.6.patch Compactions should trigger stats recalculation for columns which already have sats. https://reviews.apache.org/r/25201/ Major compactions will cause the Compactor to see which columns already have stats and run analyze command for those columns. If compacting a partition then stats for that partition will be computed. If table is not partitioned, then the whole table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7208) move SearchArgument interface into serde package
[ https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-7208: --- Attachment: HIVE-7208.03.patch File deletion has not been rebased properly move SearchArgument interface into serde package Key: HIVE-7208 URL: https://issues.apache.org/jira/browse/HIVE-7208 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-7208.01.patch, HIVE-7208.02.patch, HIVE-7208.03.patch, HIVE-7208.patch For usage in alternative input formats/serdes, it might be useful to move SearchArgument class to a place that is not in ql (because it's hard to depend on ql). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7956) When inserting into a bucketed table, all data goes to a single bucket [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120484#comment-14120484 ] Brock Noland commented on HIVE-7956: I thought that MR does this by setting the number of reducers equal to the number of buckets. When inserting into a bucketed table, all data goes to a single bucket [Spark Branch] - Key: HIVE-7956 URL: https://issues.apache.org/jira/browse/HIVE-7956 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li I created a bucketed table: {code} create table testBucket(x int,y string) clustered by(x) into 10 buckets; {code} Then I run a query like: {code} set hive.enforce.bucketing = true; insert overwrite table testBucket select intCol,stringCol from src; {code} Here {{src}} is a simple textfile-based table containing 4000 records (not bucketed). The query launches 10 reduce tasks but all the data goes to only one of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6948) HiveServer2 doesn't respect HIVE_AUX_JARS_PATH
[ https://issues.apache.org/jira/browse/HIVE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120490#comment-14120490 ] Brock Noland commented on HIVE-6948: This is a dup of HIVE-6820. HiveServer2 doesn't respect HIVE_AUX_JARS_PATH -- Key: HIVE-6948 URL: https://issues.apache.org/jira/browse/HIVE-6948 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Peng Zhang Fix For: 0.14.0 Attachments: HIVE-6948.patch, HIVE-6948.patch HiveServer2 ignores HIVE_AUX_JARS_PATH. This will cause aux jars not distributed to Yarn cluster, and job will fail without dependent jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120495#comment-14120495 ] Gunther Hagleitner commented on HIVE-7826: -- Thanks [~damien.carol]. Your last comment definitely made my day :-) Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Components: Tez Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14, tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate Add these properties : ||Property||Default Value|| |{{hive.tez.dynamic.partition.pruning}}|true| |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L| |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7969) Use Optiq's native FieldTrimmer instead of HiveRelFieldTrimmer
Ashutosh Chauhan created HIVE-7969: -- Summary: Use Optiq's native FieldTrimmer instead of HiveRelFieldTrimmer Key: HIVE-7969 URL: https://issues.apache.org/jira/browse/HIVE-7969 Project: Hive Issue Type: Sub-task Components: CBO, Logical Optimizer Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan After patch series of OPTIQ-391 OPTIQ-392 OPTIQ-395 OPTIQ-396 its now possible to use Optiq's native FieldTrimmer. So, lets use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6948) HiveServer2 doesn't respect HIVE_AUX_JARS_PATH
[ https://issues.apache.org/jira/browse/HIVE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6948: --- Resolution: Duplicate Fix Version/s: (was: 0.14.0) Status: Resolved (was: Patch Available) HiveServer2 doesn't respect HIVE_AUX_JARS_PATH -- Key: HIVE-6948 URL: https://issues.apache.org/jira/browse/HIVE-6948 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Peng Zhang Attachments: HIVE-6948.patch, HIVE-6948.patch HiveServer2 ignores HIVE_AUX_JARS_PATH. This will cause aux jars not distributed to Yarn cluster, and job will fail without dependent jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)