date:20140903


[ 
https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119453#comment-14119453
 ] 

Hive QA commented on HIVE-7943:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666082/HIVE-7943.1.patch

{color:green}SUCCESS:{color} +1 6142 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/611/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/611/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-611/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666082

 hive.security.authorization.createtable.owner.grants is ineffective with 
 Default Authorization
 --

 Key: HIVE-7943
 URL: https://issues.apache.org/jira/browse/HIVE-7943
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 0.13.1
Reporter: Ashu Pachauri
 Attachments: HIVE-7943.1.patch


 HIVE-6250 separates owner privileges from user privileges. However, Default 
 Authorization does not adapt to the change and table owners do not inherit 
 permissions from the config.
 Steps to Reproduce:
 set hive.security.authorization.enabled=true;
 set hive.security.authorization.createtable.owner.grants=ALL;
 create table temp_table(id int, value string);
 drop table temp_table;
 Above set of operations throw the following error:
 
 Authorization failed:No privilege 'Drop' found for outputs { 
 database:default, table:temp_table}. Use SHOW GRANT to get more details.
 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' 
 found for outputs { database:default, table:temp_table}. Use SHOW GRANT to 
 get more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25245: Support dynamic service discovery for HiveServer2


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25245/#review52116
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/25245/#comment90858

zk_ seems rendundant for location in zookeeper



jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
https://reviews.apache.org/r/25245/#comment90880

As this is a param in hiveserver2 jdbc URL, I think hive.server2. part is 
redundant. That part makes the url unncessarily verbose. 

I realize we have two params which have this prefix, but I think we should 
remove it from those as well (in another jira).



jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
https://reviews.apache.org/r/25245/#comment90881

I think we are likely to have people wanting to implement other modes of 
dynamically picking the HS2 host. For example, you could simply have multiple 
HS2 hostnames in a URL (instead of zookeeper hosts). Or people might decide to 
store the hostnames in another place instead of zookeeper.

So I think instead of making this param a boolean, it is better to have the 
value as none (default) or zookeeper.

Maybe change the param name also to service.discovery.mode ?



jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
https://reviews.apache.org/r/25245/#comment90859

comment moved to wrong line ?



jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
https://reviews.apache.org/r/25245/#comment90883

I think we can still make use of the java URI class for parameter parsing 
by just parsing the hostname portion first. Custom parsing of params in this 
mode can introduce bugs or inconsistencies.

The JdbcConnectionParams can be expanded to give a list of hosts.
The Utils.parseURL can first extract and substitute the multiple hostnames 
(if any), and then use the regular java URI parsing.
We can have the to validate if the current discovery mode supports multiple 
hosts, after parsing.


- Thejas Nair


On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25245/
 ---
 
 (Updated Sept. 2, 2014, 10:05 a.m.)
 
 
 Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair.
 
 
 Bugs: HIVE-7935
 https://issues.apache.org/jira/browse/HIVE-7935
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 https://issues.apache.org/jira/browse/HIVE-7935
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 
   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 
   jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
  46044d0 
   ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java 
 PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
  59294b1 
   service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 
   
 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
 21c33bc 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 bc0a02c 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 d573592 
   
 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
  37b05fc 
   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
 027931e 
   
 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
 c380b69 
   service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb 
   
 service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java
  66fc1fc 
 
 Diff: https://reviews.apache.org/r/25245/diff/
 
 
 Testing
 ---
 
 Manual testing + test cases.
 
 
 Thanks,
 
 Vaibhav Gumashta

Re: Review Request 25245: Support dynamic service discovery for HiveServer2


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25245/#review52139
---



jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
https://reviews.apache.org/r/25245/#comment90887

How big are the zookeeper jars ? If we use zookeeper in this class, I 
believe zookeeper jars will always be needed for jdbc driver.
It would be better to have the zookeeper service discovery code in a 
separate util class. That way we will need zookeeper jars only if this mode is 
used.



service/src/java/org/apache/hive/service/server/HiveServer2.java
https://reviews.apache.org/r/25245/#comment90889

It will be useful to log at info level that it is re-using the existing 
znode.



service/src/java/org/apache/hive/service/server/HiveServer2.java
https://reviews.apache.org/r/25245/#comment90890

It will be useful to have HS2 de-register itself if it gets a kill signal 
it can handle.
That part can be done as part of follow-up jira as well. (Until then admin 
will need to manually edit the zookeeper entry).


- Thejas Nair


On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25245/
 ---
 
 (Updated Sept. 2, 2014, 10:05 a.m.)
 
 
 Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair.
 
 
 Bugs: HIVE-7935
 https://issues.apache.org/jira/browse/HIVE-7935
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 https://issues.apache.org/jira/browse/HIVE-7935
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 
   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 
   jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
  46044d0 
   ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java 
 PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
  59294b1 
   service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 
   
 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
 21c33bc 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 bc0a02c 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 d573592 
   
 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
  37b05fc 
   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
 027931e 
   
 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
 c380b69 
   service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb 
   
 service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java
  66fc1fc 
 
 Diff: https://reviews.apache.org/r/25245/diff/
 
 
 Testing
 ---
 
 Manual testing + test cases.
 
 
 Thanks,
 
 Vaibhav Gumashta

[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types


 [ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-5760:
---
Attachment: HIVE-5760.91.patch

 Add vectorized support for CHAR/VARCHAR data types
 --

 Key: HIVE-5760
 URL: https://issues.apache.org/jira/browse/HIVE-5760
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Matt McCline
 Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, 
 HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, 
 HIVE-5760.91.patch


 Add support to allow queries referencing VARCHAR columns and expression 
 results to run efficiently in vectorized mode. This should re-use the code 
 for the STRING type to the extent possible and beneficial. Include unit tests 
 and end-to-end tests. Consider re-using or extending existing end-to-end 
 tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types


 [ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-5760:
---
Status: Patch Available  (was: In Progress)

 Add vectorized support for CHAR/VARCHAR data types
 --

 Key: HIVE-5760
 URL: https://issues.apache.org/jira/browse/HIVE-5760
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Matt McCline
 Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, 
 HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, 
 HIVE-5760.91.patch


 Add support to allow queries referencing VARCHAR columns and expression 
 results to run efficiently in vectorized mode. This should re-use the code 
 for the STRING type to the extent possible and beneficial. Include unit tests 
 and end-to-end tests. Consider re-using or extending existing end-to-end 
 tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25288: HIVE-7941: add unit test case for line wrapping


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25288/#review52145
---



beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java
https://reviews.apache.org/r/25288/#comment90895

Thanks for adding the test case for truncation. Can you also add one for 
the line wrapping?


- Thejas Nair


On Sept. 3, 2014, 5:48 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25288/
 ---
 
 (Updated Sept. 3, 2014, 5:48 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7941: add unit test case for line wrapping
 
 
 Diffs
 -
 
   beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25288/diff/
 
 
 Testing
 ---
 
 UT
 
 
 Thanks,
 
 cheng xu

Re: Review Request 25288: HIVE-7941: add unit test case for line wrapping



 On Sept. 3, 2014, 7:30 a.m., Thejas Nair wrote:
  beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java, line 45
  https://reviews.apache.org/r/25288/diff/1/?file=674876#file674876line45
 
  Thanks for adding the test case for truncation. Can you also add one 
  for the line wrapping?

I mean, when a long input line is printed across multiple lines.


- Thejas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25288/#review52145
---


On Sept. 3, 2014, 5:48 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25288/
 ---
 
 (Updated Sept. 3, 2014, 5:48 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7941: add unit test case for line wrapping
 
 
 Diffs
 -
 
   beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25288/diff/
 
 
 Testing
 ---
 
 UT
 
 
 Thanks,
 
 cheng xu

[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk


 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7946:
---
Component/s: CBO

 CBO: Merge CBO changes to Trunk
 ---

 Key: HIVE-7946
 URL: https://issues.apache.org/jira/browse/HIVE-7946
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7946.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5775) Introduce Cost Based Optimizer to Hive


 [ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-5775:
---
Component/s: CBO

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions


[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119512#comment-14119512
 ] 

Hive QA commented on HIVE-7223:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666093/HIVE-7223.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6145 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/612/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/612/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-612/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666093

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, 
 HIVE-7223.4.patch


 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which

[jira] [Updated] (HIVE-7324) CBO: provide a mechanism to test CBO features based on table stats only (w/o table data)

[
https://issues.apache.org/jira/browse/HIVE-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Damien Carol updated HIVE-7324:
---
Component/s: CBO

CBO: provide a mechanism to test CBO features based on table stats only (w/o
table data)

Key: HIVE-7324
URL: https://issues.apache.org/jira/browse/HIVE-7324
Project: Hive
Issue Type: Sub-task
Components: CBO
Reporter: Harish Butani
Assignee: Harish Butani
Attachments: HIVE-7324.1.patch, HIVE-7324.2.patch

Since lot of the CBO work is focused on planning, it will be nice to be able
to run explain query to test CBO features. TPCDS has a rich enough schema and
query set. So the patch loads a dump TPCDS(Scale 1) stats.
1. TestCBO shows a way to load stats from a dump and run explain on a tpcds
query. The output is currently dumped to Sys.out. This can be improved by
hooking to QTestUtil, but hopefully this is a good start.
2. Uncovered couple of issues in the process of testing this:
a) PartitionPruner fails on 'true' constants. For e.g. you will get an error
for
{code:sql}
SELECT *
FROM t WHERE
partCol 100 AND true
{code}
This gets exposed because the predicates coming out of Optiq can contain
'true' predicates.
b) OpTraitsRulesProcFactory:checkBucketedTable checks that number of files =
numBuckets. This fails because there are no dataFiles. So I have altered it
to catch exceptions and assume bucketMapJoinConvertible = false if an
exception is encountered here.
Uploading with these changes in this patch for now. Will carve them out as
separate patches.
[~ashutoshc], [~hagleitn] can you please take a look.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7280) CBO V1


 [ 
https://issues.apache.org/jira/browse/HIVE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7280:
---
Component/s: CBO

 CBO V1
 --

 Key: HIVE-7280
 URL: https://issues.apache.org/jira/browse/HIVE-7280
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7280.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end


 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---
Attachment: HIVE-7889.4.patch

Rebased

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, 
 HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7956) When inserting into a bucketed table, all data goes to a single bucket [Spark Branch]

2014-09-03 Thread Rui Li (JIRA)

Rui Li created HIVE-7956:


 Summary: When inserting into a bucketed table, all data goes to a 
single bucket [Spark Branch]
 Key: HIVE-7956
 URL: https://issues.apache.org/jira/browse/HIVE-7956
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li


I created a bucketed table:
{code}
create table testBucket(x int,y string) clustered by(x) into 10 buckets;
{code}
Then I run a query like:
{code}
set hive.enforce.bucketing = true;
insert overwrite table testBucket select intCol,stringCol from src;
{code}
Here {{src}} is a simple textfile-based table containing 4000 records (not 
bucketed). The query launches 10 reduce tasks but all the data goes to only one 
of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct


[ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119667#comment-14119667
 ] 

Hive QA commented on HIVE-7944:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666101/HIVE-7944.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6142 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/613/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/613/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-613/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666101

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7794) Enable tests on Spark branch (4) [Sparch Branch]

2014-09-03 Thread Chinna Rao Lalam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7794:
---
Status: Open  (was: Patch Available)

 Enable tests on Spark branch (4) [Sparch Branch]
 

 Key: HIVE-7794
 URL: https://issues.apache.org/jira/browse/HIVE-7794
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7794-spark.patch


 This jira is to enable *most* of the tests below. If tests don't pass because 
 of some unsupported feature, ensure that a JIRA exists and move on.
 {noformat}
   vector_cast_constant.q,\
   vector_data_types.q,\
   vector_decimal_aggregate.q,\
   vector_left_outer_join.q,\
   vector_string_concat.q,\
   vectorization_12.q,\
   vectorization_13.q,\
   vectorization_14.q,\
   vectorization_15.q,\
   vectorization_9.q,\
   vectorization_part_project.q,\
   vectorization_short_regress.q,\
   vectorized_mapjoin.q,\
   vectorized_nested_mapjoin.q,\
   vectorized_ptf.q,\
   vectorized_shufflejoin.q,\
   vectorized_timestamp_funcs.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup


[ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119727#comment-14119727
 ] 

Hive QA commented on HIVE-6847:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666115/HIVE-6847.9.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6134 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestJdbcWithMiniMr.org.apache.hive.jdbc.TestJdbcWithMiniMr
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.org.apache.hive.service.TestHS2ImpersonationWithRemoteMS
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/614/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/614/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-614/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666115

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, 
 HIVE-6847.4.patch, HIVE-6847.5.patch, HIVE-6847.6.patch, HIVE-6847.7.patch, 
 HIVE-6847.8.patch, HIVE-6847.9.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7826:
-
Attachment: HIVE-7826.7.patch

.7 is rebased.

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14, tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
 HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate
 Add  these properties :
 ||Property||Default Value||
 |{{hive.tez.dynamic.partition.pruning}}|true|
 |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
 |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-7826) Dynamic partition pruning on Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-7826.
--
Resolution: Fixed

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14, tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
 HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate
 Add  these properties :
 ||Property||Default Value||
 |{{hive.tez.dynamic.partition.pruning}}|true|
 |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
 |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119743#comment-14119743
 ] 

Gunther Hagleitner commented on HIVE-7826:
--

Committed to branch. Thanks [~vikram.dixit]. [~damien.carol] thanks for trying 
it out. Let me know if you're still having problems with this. I'll address in 
follow up if need be.

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14, tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
 HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate
 Add  these properties :
 ||Property||Default Value||
 |{{hive.tez.dynamic.partition.pruning}}|true|
 |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
 |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7957) Revisit event version handling in dynamic partition pruning on Tez

Gunther Hagleitner created HIVE-7957:


 Summary: Revisit event version handling in dynamic partition 
pruning on Tez
 Key: HIVE-7957
 URL: https://issues.apache.org/jira/browse/HIVE-7957
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner


Once TEZ-1447 is resolved, we should be able to simplify the handing of event 
versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7924) auto_sortmerge_join_8 sometimes fails with OOM


[ 
https://issues.apache.org/jira/browse/HIVE-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119750#comment-14119750
 ] 

Gunther Hagleitner commented on HIVE-7924:
--

LGTM +1. Won't hurt - did you see it fix the issue though?

 auto_sortmerge_join_8 sometimes fails with OOM
 --

 Key: HIVE-7924
 URL: https://issues.apache.org/jira/browse/HIVE-7924
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7294.patch


 Saw in some runs of this test, the following in the 
 [log|http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-572/failed/TestCliDriver-rcfile_merge1.q-fileformat_text.q-stats2.q-and-12-more/hive.log]:
 {noformat}
 (MapredLocalTask.java:executeInProcess(321)) - Hive Runtime Error: Map local 
 work exhausted memory
 org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 
 2014-08-29 08:31:56  Processing rows:4   Hashtable size: 3   
 Memory usage:   1531884480  percentage: 0.802
   at 
 org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)
   at 
 org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:251)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7949) Create table LIKE command doesn't set new owner


[ 
https://issues.apache.org/jira/browse/HIVE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119761#comment-14119761
 ] 

Hive QA commented on HIVE-7949:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666116/HIVE-7949.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6142 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/615/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/615/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-615/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666116

 Create table LIKE command doesn't set new owner
 ---

 Key: HIVE-7949
 URL: https://issues.apache.org/jira/browse/HIVE-7949
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0, 0.13.1
Reporter: Pala M Muthaia
Assignee: Pala M Muthaia
 Fix For: 0.13.0, 0.13.1

 Attachments: HIVE-7949.1.patch


 'Create table like' command doesn't set the current user as owner of new 
 table, instead new table owner is same as source table owner.
 This is a regression from 0.12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package


[ 
https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119865#comment-14119865
 ] 

Hive QA commented on HIVE-7208:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666124/HIVE-7208.02.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6142 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_create
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_timestamp
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/617/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/617/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-617/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666124

 move SearchArgument interface into serde package
 

 Key: HIVE-7208
 URL: https://issues.apache.org/jira/browse/HIVE-7208
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-7208.01.patch, HIVE-7208.02.patch, HIVE-7208.patch


 For usage in alternative input formats/serdes, it might be useful to move 
 SearchArgument class to a place that is not in ql (because it's hard to 
 depend on ql).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7826:
---
Component/s: Tez

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14, tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
 HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate
 Add  these properties :
 ||Property||Default Value||
 |{{hive.tez.dynamic.partition.pruning}}|true|
 |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
 |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119846#comment-14119846
 ] 

Damien Carol commented on HIVE-7826:


I tested again with the last version of the tez branch.
I can confirm that it works. Massive performance improvement with this patch.
Many of our OLAP cubes are partitioned by year.
We can now filter just 1 or 2 years which lowers the time of queries.
Thanks a lot [~hagleitn]

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14, tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
 HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate
 Add  these properties :
 ||Property||Default Value||
 |{{hive.tez.dynamic.partition.pruning}}|true|
 |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
 |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7951) InputFormats implementing (Job)Configurable should not be cached


[ 
https://issues.apache.org/jira/browse/HIVE-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119815#comment-14119815
 ] 

Hive QA commented on HIVE-7951:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666119/HIVE-7951.1.patch.txt

{color:green}SUCCESS:{color} +1 6142 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/616/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/616/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-616/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666119

 InputFormats implementing (Job)Configurable should not be cached
 

 Key: HIVE-7951
 URL: https://issues.apache.org/jira/browse/HIVE-7951
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7951.1.patch.txt


 Currently, initial configuration instance is shared to all following input 
 formats, which should not be like that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types


[ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119917#comment-14119917
 ] 

Hive QA commented on HIVE-5760:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666176/HIVE-5760.91.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/619/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/619/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-619/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-619/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen contrib/target 
service/target serde/target beeline/target 
beeline/src/test/org/apache/hive/beeline/TestTableOutputFormat.java odbc/target 
cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1622274.

At revision 1622274.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666176

 Add vectorized support for CHAR/VARCHAR data types
 --

 Key: HIVE-5760
 URL: https://issues.apache.org/jira/browse/HIVE-5760
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Matt McCline
 Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, 
 HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, 
 HIVE-5760.91.patch


 Add support to allow queries referencing VARCHAR columns and expression 
 results to run efficiently in vectorized mode. This should re-use the code 
 for the STRING type to the extent possible and beneficial. Include unit tests 
 and end-to-end tests. Consider re-using or extending existing end-to-end 
 tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7571) RecordUpdater should read virtual columns from row

2014-09-03 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119923#comment-14119923
 ] 

Owen O'Malley commented on HIVE-7571:
-

+1 LGTM

You might want to replace s/recIdCol/RecordIdColumn/g to be more readable, 
since it is a public API.

 RecordUpdater should read virtual columns from row
 --

 Key: HIVE-7571
 URL: https://issues.apache.org/jira/browse/HIVE-7571
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7571.2.patch, HIVE-7571.WIP.patch, HIVE-7571.patch


 Currently RecordUpdater.update and delete take rowid and original transaction 
 as parameters.  These values are already present in the row as part of the 
 new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer 
 from there.  And the writer will already have to handle skipping ROW__ID when 
 writing, so it needs to be aware of that column anyone.
 We could instead read the values from ROW__ID and then remove it from the 
 object inspector in FileSinkOperator, but this will be hard in the 
 vectorization case where rows are being dealt with 10k at a time.
 For these reasons it makes more sense to do this work in the writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25245: Support dynamic service discovery for HiveServer2

2014-09-03 Thread Alan Gates


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25245/#review52171
---



service/src/java/org/apache/hive/service/server/HiveServer2.java
https://reviews.apache.org/r/25245/#comment90921

It seems like we want more than warn here if we fail to create the parent 
node.  In this case we'll be unable to create the node for this instance, and 
clients will be unable to find the server.  I would think this should be fatal.



service/src/java/org/apache/hive/service/server/HiveServer2.java
https://reviews.apache.org/r/25245/#comment90922

Agree we should have a clean shutdown case.  The timeout was 3 minutes I 
think, which means it will be a while after the system shuts down that clients 
keep trying to contact it.


- Alan Gates


On Sept. 2, 2014, 10:05 a.m., Vaibhav Gumashta wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25245/
 ---
 
 (Updated Sept. 2, 2014, 10:05 a.m.)
 
 
 Review request for hive, Alan Gates, Navis Ryu, Szehon Ho, and Thejas Nair.
 
 
 Bugs: HIVE-7935
 https://issues.apache.org/jira/browse/HIVE-7935
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 https://issues.apache.org/jira/browse/HIVE-7935
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 
   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 
   jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
  46044d0 
   ql/src/java/org/apache/hadoop/hive/ql/util/ZooKeeperHiveHelper.java 
 PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/zookeeper/TestZookeeperLockManager.java
  59294b1 
   service/src/java/org/apache/hive/service/cli/CLIService.java 08ed2e7 
   
 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
 21c33bc 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 bc0a02c 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 d573592 
   
 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
  37b05fc 
   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
 027931e 
   
 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
 c380b69 
   service/src/java/org/apache/hive/service/server/HiveServer2.java 0864dfb 
   
 service/src/test/org/apache/hive/service/cli/session/TestSessionGlobalInitFile.java
  66fc1fc 
 
 Diff: https://reviews.apache.org/r/25245/diff/
 
 
 Testing
 ---
 
 Manual testing + test cases.
 
 
 Thanks,
 
 Vaibhav Gumashta

[jira] [Commented] (HIVE-7941) add test case for beeline line wrapping


[ 
https://issues.apache.org/jira/browse/HIVE-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119916#comment-14119916
 ] 

Hive QA commented on HIVE-7941:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666158/HIVE-7941.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6143 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/618/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/618/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-618/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666158

 add test case for beeline line wrapping
 ---

 Key: HIVE-7941
 URL: https://issues.apache.org/jira/browse/HIVE-7941
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Ferdinand Xu
 Attachments: HIVE-7941.patch


 The patch HIVE-6928 does not add tests that actually verify that line 
 wrapping takes place.
 It will be good to have a test for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7890) SessionState creates HMS Client while not impersonating


[ 
https://issues.apache.org/jira/browse/HIVE-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119997#comment-14119997
 ] 

Brock Noland commented on HIVE-7890:


Thank you Dong and Prasad! I have committed this to trunk.

 SessionState creates HMS Client while not impersonating
 ---

 Key: HIVE-7890
 URL: https://issues.apache.org/jira/browse/HIVE-7890
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-7890.2.patch


 In SessionState.start [an instance of the the HMSClient is 
 created|https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L367].
  When impersonation is enabled, this call does not occur within a doas call 
 and thus the HMSClient is created as the server user, not the impersonated 
 user.
 Thus calls to the HMS are made by the hive user as opposed to the end user. 
 This causes file ownership such as a database directory owner to be 
 incorrect. While debugging this, I got stack trace below. As you can see we 
 are calling getMSC without a doas.
 {noformat}
   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2474)
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:367)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.init(HiveSessionImpl.java:121)
   at 
 org.apache.hive.service.cli.session.HiveSessionImplwithUGI.init(HiveSessionImplwithUGI.java:49)
   at 
 org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:130)
   at 
 org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:163)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:290)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:208)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1313)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1298)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7890) SessionState creates HMS Client while not impersonating


 [ 
https://issues.apache.org/jira/browse/HIVE-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7890:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

 SessionState creates HMS Client while not impersonating
 ---

 Key: HIVE-7890
 URL: https://issues.apache.org/jira/browse/HIVE-7890
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-7890.2.patch


 In SessionState.start [an instance of the the HMSClient is 
 created|https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L367].
  When impersonation is enabled, this call does not occur within a doas call 
 and thus the HMSClient is created as the server user, not the impersonated 
 user.
 Thus calls to the HMS are made by the hive user as opposed to the end user. 
 This causes file ownership such as a database directory owner to be 
 incorrect. While debugging this, I got stack trace below. As you can see we 
 are calling getMSC without a doas.
 {noformat}
   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2474)
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:367)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.init(HiveSessionImpl.java:121)
   at 
 org.apache.hive.service.cli.session.HiveSessionImplwithUGI.init(HiveSessionImplwithUGI.java:49)
   at 
 org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:130)
   at 
 org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:163)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:290)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:208)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1313)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1298)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7958) SparkWork generated by SparkCompiler may require multiple Spark jobs to run

2014-09-03 Thread Xuefu Zhang (JIRA)

Xuefu Zhang created HIVE-7958:
-

 Summary: SparkWork generated by SparkCompiler may require multiple 
Spark jobs to run
 Key: HIVE-7958
 URL: https://issues.apache.org/jira/browse/HIVE-7958
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Priority: Critical


A SparkWork instance currently may contain disjointed work graphs. For 
instance, union_remove_1.q may generated a plan like this:
{code}
Reduce2 - Map 1
Reduce4 - Map 3
{code}
The SparkPlan instance generated from this work graph contains two result RDDs. 
When such plan is executed, we call .foreach() on the two RDDs sequentially, 
which results two Spark jobs, one after the other.

While this works functionally, the performance will not be great as the Spark 
jobs are run sequentially rather than concurrently.

Another side effect of this is that the corresponding SparkPlan instance is 
over-complicated.

The are two potential approaches:

1. Let SparkCompiler generate a work that can be executed in ONE Spark job 
only. In above example, two Spark task should be generated.

2. Let SparkPlanGenerate generate multiple Spark plans and then SparkClient 
executes them concurrently.

Approach #1 seems more reasonable and naturally fit to our architecture. Also, 
Hive's task execution framework already takes care of the task concurrency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required


 [ 
https://issues.apache.org/jira/browse/HIVE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7682:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you Prasad for the review! I have committed this to trunk!

 HadoopThriftAuthBridge20S should not reset configuration unless required
 

 Key: HIVE-7682
 URL: https://issues.apache.org/jira/browse/HIVE-7682
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-7682.1.patch, HIVE-7682.2.patch, HIVE-7682.3.patch


 In HadoopThriftAuthBridge20S methods createClientWithConf and 
 getCurrentUGIWithConf we create new Configuration objects so we can set the 
 authentication type. When loading the new Configuration object, it looks like 
 core-site.xml for the cluster it's connected to.
 This causes issues for Oozie since oozie does not have access to the 
 core-site.xml as it's cluster agnostic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7923) populate stats for test tables


 [ 
https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7923:
---
Status: Open  (was: Patch Available)

Need to update .q.out files

 populate stats for test tables
 --

 Key: HIVE-7923
 URL: https://issues.apache.org/jira/browse/HIVE-7923
 Project: Hive
  Issue Type: Improvement
Reporter: pengcheng xiong
Assignee: pengcheng xiong
Priority: Minor
 Attachments: HIVE-7923.1.patch, HIVE-7923.2.patch


 Current q_test only generates tables, e.g., src only but does not create 
 status. All the test cases will fail in CBO because CBO depends on the 
 status. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct


[ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120011#comment-14120011
 ] 

Ashutosh Chauhan commented on HIVE-7944:


[~pxiong] Have you tested this with mysql which has tables pre-created (not 
auto created via mysql)? I think there might be issues because in those cases 
csid  partid being null wont be inserted in db. 
I think let just simply revert HIVE-7876 before we figure out proper fix for 
this.

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-7944) current update stats for columns of a partition of a table is not correct


[ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120011#comment-14120011
 ] 

Ashutosh Chauhan edited comment on HIVE-7944 at 9/3/14 4:04 PM:


[~pxiong] Have you tested this with mysql which has tables pre-created (not 
auto created via datanucleus)? I think there might be issues because in those 
cases csid  partid being null will prevent data from getting inserted in db. 
I think let just simply revert HIVE-7876 before we figure out proper fix for 
this.


was (Author: ashutoshc):
[~pxiong] Have you tested this with mysql which has tables pre-created (not 
auto created via mysql)? I think there might be issues because in those cases 
csid  partid being null wont be inserted in db. 
I think let just simply revert HIVE-7876 before we figure out proper fix for 
this.

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end


[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120012#comment-14120012
 ] 

Hive QA commented on HIVE-7689:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666180/HIVE-7889.4.patch

{color:green}SUCCESS:{color} +1 6142 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/620/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/620/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-620/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666180

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, 
 HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7959) TestHadoop20SAuthBridge fails to compile under hadoop 2.5

Brock Noland created HIVE-7959:
--

 Summary: TestHadoop20SAuthBridge fails to compile under hadoop 2.5
 Key: HIVE-7959
 URL: https://issues.apache.org/jira/browse/HIVE-7959
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


I tested Hadoop under 2.5 and it fails to compile due to use of private apis 
which changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7959) TestHadoop20SAuthBridge fails to compile under hadoop 2.5


 [ 
https://issues.apache.org/jira/browse/HIVE-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7959:
---
Attachment: HIVE-7959.1.patch

 TestHadoop20SAuthBridge fails to compile under hadoop 2.5
 -

 Key: HIVE-7959
 URL: https://issues.apache.org/jira/browse/HIVE-7959
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
 Attachments: HIVE-7959.1.patch


 I tested Hadoop under 2.5 and it fails to compile due to use of private apis 
 which changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7960) Upgrade to Hadoop 2.5

Brock Noland created HIVE-7960:
--

 Summary: Upgrade to Hadoop 2.5
 Key: HIVE-7960
 URL: https://issues.apache.org/jira/browse/HIVE-7960
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland


Tracking JIRA for upgrading to 2.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-7959) TestHadoop20SAuthBridge fails to compile under hadoop 2.5


 [ 
https://issues.apache.org/jira/browse/HIVE-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-7959.

Resolution: Duplicate

 TestHadoop20SAuthBridge fails to compile under hadoop 2.5
 -

 Key: HIVE-7959
 URL: https://issues.apache.org/jira/browse/HIVE-7959
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
 Attachments: HIVE-7959.1.patch


 I tested Hadoop under 2.5 and it fails to compile due to use of private apis 
 which changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7184) TestHadoop20SAuthBridge no longer compiles after HADOOP-10448


 [ 
https://issues.apache.org/jira/browse/HIVE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7184:
---
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7960

 TestHadoop20SAuthBridge no longer compiles after HADOOP-10448
 -

 Key: HIVE-7184
 URL: https://issues.apache.org/jira/browse/HIVE-7184
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 0.14.0
Reporter: Jason Dere
 Attachments: HIVE-7184.1.patch, HIVE-7184.2.patch


 HADOOP-10448 moves a couple of methods which were being used by the 
 TestHadoop20SAuthBridge test. If/when Hive build uses Hadoop 2.5 as a 
 dependency, this will cause compilation errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7184) TestHadoop20SAuthBridge no longer compiles after HADOOP-10448


[ 
https://issues.apache.org/jira/browse/HIVE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120082#comment-14120082
 ] 

Brock Noland commented on HIVE-7184:


Made subtask of HIVE-7960.

 TestHadoop20SAuthBridge no longer compiles after HADOOP-10448
 -

 Key: HIVE-7184
 URL: https://issues.apache.org/jira/browse/HIVE-7184
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 0.14.0
Reporter: Jason Dere
 Attachments: HIVE-7184.1.patch, HIVE-7184.2.patch


 HADOOP-10448 moves a couple of methods which were being used by the 
 TestHadoop20SAuthBridge test. If/when Hive build uses Hadoop 2.5 as a 
 dependency, this will cause compilation errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7553) avoid the scheduling maintenance window for every jar change


[ 
https://issues.apache.org/jira/browse/HIVE-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120093#comment-14120093
 ] 

Brock Noland commented on HIVE-7553:


[~Ferd] the jar cannot be included in the patch itself. Can you upload them to 
the JIRA for manual testing?

Thanks!!

 avoid the scheduling maintenance window for every jar change
 

 Key: HIVE-7553
 URL: https://issues.apache.org/jira/browse/HIVE-7553
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-7553.1.patch, HIVE-7553.2.patch, HIVE-7553.3.patch, 
 HIVE-7553.patch, HIVE-7553.pdf


 When user needs to refresh existing or add a new jar to HS2, it needs to 
 restart it. As HS2 is service exposed to clients, this requires scheduling 
 maintenance window for every jar change. It would be great if we could avoid 
 that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7948) Add an E2E test to verify fix for HIVE-7155

2014-09-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120098#comment-14120098
 ] 

Eugene Koifman commented on HIVE-7948:
--

It seems like this test requires manual setup steps.  This complicates running 
the tests and in practice usually means that they are not being run.  If you 
look at hcatalog/src/tests/e2e/templeton/deployers there is a set of scripts 
that help automate the sat up.
It should be easy to modify deploy_e2e_artifacts to makes sure the newly 
required data files is copied to hdfs.  
config/ has some precanned config files - this logic may need to be improved a 
bit to be able to deploy a -site.xml file specific to a given group of tests.

 Add an E2E test  to verify fix for HIVE-7155
 

 Key: HIVE-7948
 URL: https://issues.apache.org/jira/browse/HIVE-7948
 Project: Hive
  Issue Type: Test
  Components: Tests, WebHCat
Reporter: Aswathy Chellammal Sreekumar
Assignee: Aswathy Chellammal Sreekumar
Priority: Minor
 Attachments: HIVE-7948.patch


 E2E Test to verify webhcat property templeton.mapper.memory.mb correctly 
 overrides mapreduce.map.memory.mb. The feature was added as part of HIVE-7155.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7961) metastore schema improvement for adding partition to Hive table

2014-09-03 Thread Chu Tong (JIRA)

Chu Tong created HIVE-7961:
--

 Summary: metastore schema improvement for adding partition to Hive 
table
 Key: HIVE-7961
 URL: https://issues.apache.org/jira/browse/HIVE-7961
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Chu Tong
Priority: Minor


One of the performance bottlenecks for adding a partition in Hive table and the 
query takes most of the time in this process is:
SELECT A0.PART_NAME FROM PARTITIONS A0 LEFT OUTER JOIN TBLS B0 ON A0.TBL_ID = 
B0.TBL_ID LEFT OUTER JOIN DBS C0 ON B0.DB_ID = C0.DB_ID WHERE B0.TBL_NAME = @P0 
AND C0.NAME = @P1 AND A0.PART_NAME = @P2
This query joins partition table with table table and database table in Hive 
metastore and it becomes slow when these tables are big.
A viable way to optimize this is the de-normalize the partition table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct

2014-09-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120161#comment-14120161
 ] 

Eugene Koifman commented on HIVE-7944:
--

+1 for the revert idea

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7949) Create table LIKE command doesn't set new owner

2014-09-03 Thread Pala M Muthaia (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120237#comment-14120237
 ] 

Pala M Muthaia commented on HIVE-7949:
--

[~ashutoshc]  [~navis], can either of you review or add appropriate reviewers 
for this patch?

Also i ran the 2 tests that failed above locally and they passed. Thanks.

 Create table LIKE command doesn't set new owner
 ---

 Key: HIVE-7949
 URL: https://issues.apache.org/jira/browse/HIVE-7949
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0, 0.13.1
Reporter: Pala M Muthaia
Assignee: Pala M Muthaia
 Fix For: 0.13.0, 0.13.1

 Attachments: HIVE-7949.1.patch


 'Create table like' command doesn't set the current user as owner of new 
 table, instead new table owner is same as source table owner.
 This is a regression from 0.12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct


 [ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7944:
--
Status: Open  (was: Patch Available)

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct


 [ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7944:
--
Attachment: HIVE-7944.2.patch

reverse the patch

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct


 [ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7944:
--
Status: Patch Available  (was: Open)

reverse the patch

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25281: speed up the write path of col stats of partitions

2014-09-03 Thread pengcheng xiong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25281/
---

(Updated Sept. 3, 2014, 6:50 p.m.)


Review request for hive.


Repository: hive-git


Description (updated)
---

reverse the patch


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
68b5563 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java a16d1c2 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 0fdafa2 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 6b5e79d 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 981c5ff 

Diff: https://reviews.apache.org/r/25281/diff/


Testing
---


Thanks,

pengcheng xiong

[jira] [Commented] (HIVE-7949) Create table LIKE command doesn't set new owner


[ 
https://issues.apache.org/jira/browse/HIVE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120252#comment-14120252
 ] 

Ashutosh Chauhan commented on HIVE-7949:


LGTM +1 cc: [~thejas]

 Create table LIKE command doesn't set new owner
 ---

 Key: HIVE-7949
 URL: https://issues.apache.org/jira/browse/HIVE-7949
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0, 0.13.1
Reporter: Pala M Muthaia
Assignee: Pala M Muthaia
 Fix For: 0.13.0, 0.13.1

 Attachments: HIVE-7949.1.patch


 'Create table like' command doesn't set the current user as owner of new 
 table, instead new table owner is same as source table owner.
 This is a regression from 0.12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct


[ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120254#comment-14120254
 ] 

pengcheng xiong commented on HIVE-7944:
---

After I tested this with mysql which has tables pre-created (not auto created 
via datanucleus), I find that my method does not work. Thus, I followed the 
revert idea

https://reviews.apache.org/r/25281/

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct


[ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120263#comment-14120263
 ] 

Ashutosh Chauhan commented on HIVE-7944:


+1

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization

2014-09-03 Thread Ashu Pachauri (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120264#comment-14120264
 ] 

Ashu Pachauri commented on HIVE-7943:
-

[~thejas] Can you  have a look at this?

 hive.security.authorization.createtable.owner.grants is ineffective with 
 Default Authorization
 --

 Key: HIVE-7943
 URL: https://issues.apache.org/jira/browse/HIVE-7943
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 0.13.1
Reporter: Ashu Pachauri
 Attachments: HIVE-7943.1.patch


 HIVE-6250 separates owner privileges from user privileges. However, Default 
 Authorization does not adapt to the change and table owners do not inherit 
 permissions from the config.
 Steps to Reproduce:
 set hive.security.authorization.enabled=true;
 set hive.security.authorization.createtable.owner.grants=ALL;
 create table temp_table(id int, value string);
 drop table temp_table;
 Above set of operations throw the following error:
 
 Authorization failed:No privilege 'Drop' found for outputs { 
 database:default, table:temp_table}. Use SHOW GRANT to get more details.
 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' 
 found for outputs { database:default, table:temp_table}. Use SHOW GRANT to 
 get more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-09-03 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---
Attachment: HIVE-7223.5.patch

I've taken your advice a step further, and removed redundancy in 
{{HiveMetaStore.initializeAddedPartition()}}, 
{{MetaStoreUtils.updatePartitionStatsFast()}}, and 
{{Warehouse.getFileStatusesForSD()}}. Cleaner, all around.

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, 
 HIVE-7223.4.patch, HIVE-7223.5.patch


 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-09-03 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---
Status: Patch Available  (was: Open)

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.13.0, 0.12.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, 
 HIVE-7223.4.patch, HIVE-7223.5.patch


 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-09-03 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---
Status: Open  (was: Patch Available)

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.13.0, 0.12.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch, HIVE-7223.3.patch, 
 HIVE-7223.4.patch, HIVE-7223.5.patch


 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7949) Create table LIKE command doesn't set new owner


[ 
https://issues.apache.org/jira/browse/HIVE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120267#comment-14120267
 ] 

Thejas M Nair commented on HIVE-7949:
-

+1

 Create table LIKE command doesn't set new owner
 ---

 Key: HIVE-7949
 URL: https://issues.apache.org/jira/browse/HIVE-7949
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0, 0.13.1
Reporter: Pala M Muthaia
Assignee: Pala M Muthaia
 Fix For: 0.13.0, 0.13.1

 Attachments: HIVE-7949.1.patch


 'Create table like' command doesn't set the current user as owner of new 
 table, instead new table owner is same as source table owner.
 This is a regression from 0.12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types


 [ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-5760:
---
Status: In Progress  (was: Patch Available)

 Add vectorized support for CHAR/VARCHAR data types
 --

 Key: HIVE-5760
 URL: https://issues.apache.org/jira/browse/HIVE-5760
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Matt McCline
 Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, 
 HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, 
 HIVE-5760.91.patch


 Add support to allow queries referencing VARCHAR columns and expression 
 results to run efficiently in vectorized mode. This should re-use the code 
 for the STRING type to the extent possible and beneficial. Include unit tests 
 and end-to-end tests. Consider re-using or extending existing end-to-end 
 tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types


 [ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-5760:
---
Attachment: HIVE-5760.92.patch

 Add vectorized support for CHAR/VARCHAR data types
 --

 Key: HIVE-5760
 URL: https://issues.apache.org/jira/browse/HIVE-5760
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Matt McCline
 Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, 
 HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, 
 HIVE-5760.91.patch, HIVE-5760.92.patch


 Add support to allow queries referencing VARCHAR columns and expression 
 results to run efficiently in vectorized mode. This should re-use the code 
 for the STRING type to the extent possible and beneficial. Include unit tests 
 and end-to-end tests. Consider re-using or extending existing end-to-end 
 tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types


 [ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-5760:
---
Status: Patch Available  (was: In Progress)

 Add vectorized support for CHAR/VARCHAR data types
 --

 Key: HIVE-5760
 URL: https://issues.apache.org/jira/browse/HIVE-5760
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Matt McCline
 Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, 
 HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch, HIVE-5760.8.patch, 
 HIVE-5760.91.patch, HIVE-5760.92.patch


 Add support to allow queries referencing VARCHAR columns and expression 
 results to run efficiently in vectorized mode. This should re-use the code 
 for the STRING type to the extent possible and beneficial. Include unit tests 
 and end-to-end tests. Consider re-using or extending existing end-to-end 
 tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization

[
https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120306#comment-14120306
]

Thejas M Nair commented on HIVE-7943:
-

This patch does not add the owner grants into table metadata. That is the
purpose of this configuration flag. Instead it is adding the privileges at
runtime during the checks.

Looking at the current code again, I don't see a bug there wrt to the
privileges getting set at table creation. I wonder if the problem is that ALL
privileges are not getting correctly interpreted as including the the Drop
privilege.

In the example that you have in description. Can you paste the output of 'show
grant on table temp_table' ?

hive.security.authorization.createtable.owner.grants is ineffective with
Default Authorization
--

Key: HIVE-7943
URL: https://issues.apache.org/jira/browse/HIVE-7943
Project: Hive
Issue Type: Bug
Components: Authorization
Affects Versions: 0.13.1
Reporter: Ashu Pachauri
Attachments: HIVE-7943.1.patch

HIVE-6250 separates owner privileges from user privileges. However, Default
Authorization does not adapt to the change and table owners do not inherit
permissions from the config.
Steps to Reproduce:
set hive.security.authorization.enabled=true;
set hive.security.authorization.createtable.owner.grants=ALL;
create table temp_table(id int, value string);
drop table temp_table;
Above set of operations throw the following error:

Authorization failed:No privilege 'Drop' found for outputs {
database:default, table:temp_table}. Use SHOW GRANT to get more details.
14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop'
found for outputs { database:default, table:temp_table}. Use SHOW GRANT to
get more details.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-3865) Allow collect_set to work on non-primitive types

2014-09-03 Thread karthik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120362#comment-14120362
 ] 

karthik commented on HIVE-3865:
---

Did you build a UDF for collect_set() of structs.? How did you sort out the 
issue.?

 Allow collect_set to work on non-primitive types
 

 Key: HIVE-3865
 URL: https://issues.apache.org/jira/browse/HIVE-3865
 Project: Hive
  Issue Type: Improvement
Reporter: Ron Bodkin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7946) CBO: Merge CBO changes to Trunk


 [ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7946:
-
Issue Type: Bug  (was: Sub-task)
Parent: (was: HIVE-5775)

 CBO: Merge CBO changes to Trunk
 ---

 Key: HIVE-7946
 URL: https://issues.apache.org/jira/browse/HIVE-7946
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7946.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7962) Prevent Alter Table, drop,show Code paths from exercising CBO

Laljo John Pullokkaran created HIVE-7962:


 Summary: Prevent Alter Table, drop,show Code paths from exercising 
CBO
 Key: HIVE-7962
 URL: https://issues.apache.org/jira/browse/HIVE-7962
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7963) Handle UDFS : Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, format_number, instr

Laljo John Pullokkaran created HIVE-7963:


 Summary: Handle UDFS : Hash, round, if, datediff, date_add, 
date_sub, ascii, elt, coalesce, format_number, instr 
 Key: HIVE-7963
 URL: https://issues.apache.org/jira/browse/HIVE-7963
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7965) Handle Row Schema

Laljo John Pullokkaran created HIVE-7965:


 Summary: Handle Row Schema
 Key: HIVE-7965
 URL: https://issues.apache.org/jira/browse/HIVE-7965
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7964) Handle explode, lateral views

Laljo John Pullokkaran created HIVE-7964:


 Summary: Handle explode, lateral views
 Key: HIVE-7964
 URL: https://issues.apache.org/jira/browse/HIVE-7964
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7966) CBO Trunk Merge: Hive Unit test Subquery test failure

Laljo John Pullokkaran created HIVE-7966:


 Summary: CBO Trunk Merge: Hive Unit test Subquery test failure
 Key: HIVE-7966
 URL: https://issues.apache.org/jira/browse/HIVE-7966
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7944) current update stats for columns of a partition of a table is not correct


[ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120383#comment-14120383
 ] 

Hive QA commented on HIVE-7944:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666289/HIVE-7944.2.patch

{color:green}SUCCESS:{color} +1 6142 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/621/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/621/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-621/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12666289

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7963) CBO Trunk Merge:Handle UDFS : Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, format_number, instr


 [ 
https://issues.apache.org/jira/browse/HIVE-7963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7963:
-
Summary: CBO Trunk Merge:Handle UDFS : Hash, round, if, datediff, date_add, 
date_sub, ascii, elt, coalesce, format_number, instr   (was: Handle UDFS : 
Hash, round, if, datediff, date_add, date_sub, ascii, elt, coalesce, 
format_number, instr )

 CBO Trunk Merge:Handle UDFS : Hash, round, if, datediff, date_add, date_sub, 
 ascii, elt, coalesce, format_number, instr 
 

 Key: HIVE-7963
 URL: https://issues.apache.org/jira/browse/HIVE-7963
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-7966) CBO Trunk Merge: Hive Unit test Subquery test failure


 [ 
https://issues.apache.org/jira/browse/HIVE-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-7966:


Assignee: Harish Butani

 CBO Trunk Merge: Hive Unit test Subquery test failure
 -

 Key: HIVE-7966
 URL: https://issues.apache.org/jira/browse/HIVE-7966
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Harish Butani





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7962) CBO Trunk Merge:Prevent Alter Table, drop,show Code paths from exercising CBO


 [ 
https://issues.apache.org/jira/browse/HIVE-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7962:
-
Summary: CBO Trunk Merge:Prevent Alter Table, drop,show Code paths from 
exercising CBO  (was: Prevent Alter Table, drop,show Code paths from exercising 
CBO)

 CBO Trunk Merge:Prevent Alter Table, drop,show Code paths from exercising CBO
 -

 Key: HIVE-7962
 URL: https://issues.apache.org/jira/browse/HIVE-7962
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7964) CBO Trunk Merge:Handle explode, lateral views


 [ 
https://issues.apache.org/jira/browse/HIVE-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7964:
-
Summary: CBO Trunk Merge:Handle explode, lateral views  (was: Handle 
explode, lateral views)

 CBO Trunk Merge:Handle explode, lateral views
 -

 Key: HIVE-7964
 URL: https://issues.apache.org/jira/browse/HIVE-7964
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization

2014-09-03 Thread Ashu Pachauri (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120395#comment-14120395
]

Ashu Pachauri commented on HIVE-7943:
-

Is that the purpose of the configuration flag? I thought the reason for
separating owner grants from user grants was that the owner grants are
dynamically applied at the time of authorization to the current owner (if there
would be a way to change the owner). If they are persisted in metadata, the
grants need to be changed when the owner changes or when the configuration
property changes. (E.g. From ALL to SELECT, DROP etc.)

show grant on temp_table gives me empty results unless I explicitly do a
'grant all on temp_table to user testuser' . The problem is not observed only
with ALL privileges. Same problem is encountered when I change the
configuration property to DROP instead of ALL.

hive.security.authorization.createtable.owner.grants is ineffective with
Default Authorization
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7966) CBO Trunk Merge: Hive Unit test Subquery test failure


[ 
https://issues.apache.org/jira/browse/HIVE-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120400#comment-14120400
 ] 

Laljo John Pullokkaran commented on HIVE-7966:
--

1.
select *
from (select *
  from src b
  where exists
  (select a.key
  from src a
  where b.value = a.value  and a.key = b.key and a.value  'val_9')
 ) a
2.
select b.key, min(b.value)
from src b
group by b.key
having exists ( select a.key
from src a
where a.value  'val_9' and a.value = min(b.value)
)
3.
select p.p_partkey, li.l_suppkey
from (select distinct l_partkey as p_partkey from lineitem) p join lineitem li 
on p.p_partkey = li.l_partkey
where li.l_linenumber = 1 and
 li.l_orderkey in (select l_orderkey from lineitem where l_shipmode = 'AIR' and 
l_linenumber = li.l_linenumber)


4.
explain
select p_mfgr, p_name, avg(p_size)
from part
group by p_mfgr, p_name
having p_name in
  (select first_value(p_name) over(partition by p_mfgr order by p_size) from 
part)

5.
select *
from src b
where not exists
  (select a.key
  from src a
  where b.value = a.value and a.value  'val_2'
  )

6.
select *
from src b
group by key, value
having not exists
  (select distinct a.key
  from src a
  where b.value = a.value and a.value  'val_12'
  )

7.
select *
from T1_v where T1_v.key not in (select T2_v.key from T2_v)

8.
select b.p_mfgr, min(p_retailprice)
from part b
group by b.p_mfgr
having b.p_mfgr not in
  (select p_mfgr
  from part a
  group by p_mfgr
  having max(p_retailprice) - min(p_retailprice)  600
  )

9.
explain
select p_mfgr, b.p_name, p_size
from part b
where b.p_name not in
  (select p_name
  from (select p_mfgr, p_name, p_size, rank() over(partition by p_mfgr order by 
p_size) as r from part) a
  where r = 2 and b.p_mfgr = p_mfgr
  )

10.
select *
from cv3
where cv3.key in (select key from cv1)




 CBO Trunk Merge: Hive Unit test Subquery test failure
 -

 Key: HIVE-7966
 URL: https://issues.apache.org/jira/browse/HIVE-7966
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Harish Butani





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7967) CBO Trunk Merge: Fall Back in case of complex types

Laljo John Pullokkaran created HIVE-7967:


 Summary: CBO Trunk Merge: Fall Back in case of complex types
 Key: HIVE-7967
 URL: https://issues.apache.org/jira/browse/HIVE-7967
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct


 [ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7944:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Pengcheng!

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Fix For: 0.14.0

 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct


 [ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7944:
---
Affects Version/s: 0.14.0

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.14.0
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Fix For: 0.14.0

 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7944) current update stats for columns of a partition of a table is not correct


 [ 
https://issues.apache.org/jira/browse/HIVE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7944:
---
Component/s: Statistics

 current update stats for columns of a partition of a table is not correct
 -

 Key: HIVE-7944
 URL: https://issues.apache.org/jira/browse/HIVE-7944
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.14.0
Reporter: pengcheng xiong
Assignee: pengcheng xiong
 Fix For: 0.14.0

 Attachments: HIVE-7944.1.patch, HIVE-7944.2.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously 
 https://issues.apache.org/jira/browse/HIVE-7736
 and
 https://issues.apache.org/jira/browse/HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to [~ekoifman] 's comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7580) Support dynamic partitioning [Spark Branch]

2014-09-03 Thread Chinna Rao Lalam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120438#comment-14120438
 ] 

Chinna Rao Lalam commented on HIVE-7580:


Verified the below tests and all the tests are passed except  load_dyn_part1.q, 
load_dyn_part8.q

{noformat}
load_dyn_part1.q,
load_dyn_part2.q,
load_dyn_part3.q,
load_dyn_part4.q,
load_dyn_part5.q,
load_dyn_part6.q,
load_dyn_part7.q,
load_dyn_part8.q,
load_dyn_part9,
load_dyn_part10.q,
load_dyn_part11.q,
load_dyn_part12.q,
load_dyn_part13.q,
load_dyn_part.14,
load_dyn_part15.q
{noformat}

To enable the tests for dynamic partitions considered below tests(referred from 
tez)

{noformat}
load_dyn_part1.q,
load_dyn_part2.q,
load_dyn_part3.q,
dynpart_sort_optimization.q,
dynpart_sort_opt_vectorization.q
{noformat}

Here the below tests are failing
{noformat}
load_dyn_part1.q,
oad_dyn_part8.q,
dynpart_sort_optimization.q,
dynpart_sort_opt_vectorization.q 
{noformat}

For these 4 test cases we have issues. I will add these tests in those jira's 
and i will work.

{quote}
load_dyn_part1.q,load_dyn_part8.q both the tests contains multi-inserts. Need 
to test after HIVE-7503 is fixed.
{quote}

{quote}
dynpart_sort_opt_vectorization.q is related to vectorization. Need to test 
after HIVE-7794 is fixed.
{quote}

{quote}
dynpart_sort_optimization.q is hitting same exception as HIVE-7843.
{quote}

 Support dynamic partitioning [Spark Branch]
 ---

 Key: HIVE-7580
 URL: https://issues.apache.org/jira/browse/HIVE-7580
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
  Labels: Spark-M1

 My understanding is that we don't need to do anything special for this. 
 However, this needs to be verified and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization

[
https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120440#comment-14120440
]

Thejas M Nair commented on HIVE-7943:
-

The description of the configuration also mentions the purpose - the
privileges automatically granted to the owner whenever a table gets created.
This is also the case with use grants configuration.
The purpose hasn't been changed intentionally.

The reason for separating user grants and owner grants was so that the owner
user is set correctly, when the owner is changed within a session (for ease of
testing).

hive.security.authorization.createtable.owner.grants is ineffective with
Default Authorization
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7943) hive.security.authorization.createtable.owner.grants is ineffective with Default Authorization


[ 
https://issues.apache.org/jira/browse/HIVE-7943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120444#comment-14120444
 ] 

Thejas M Nair commented on HIVE-7943:
-

You can try tracing through the calls made from Hive.createTable to 
CreateTableAutomaticGrant.getUserGrants where it adds the grants to table 
object.
 

 hive.security.authorization.createtable.owner.grants is ineffective with 
 Default Authorization
 --

 Key: HIVE-7943
 URL: https://issues.apache.org/jira/browse/HIVE-7943
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 0.13.1
Reporter: Ashu Pachauri
 Attachments: HIVE-7943.1.patch


 HIVE-6250 separates owner privileges from user privileges. However, Default 
 Authorization does not adapt to the change and table owners do not inherit 
 permissions from the config.
 Steps to Reproduce:
 set hive.security.authorization.enabled=true;
 set hive.security.authorization.createtable.owner.grants=ALL;
 create table temp_table(id int, value string);
 drop table temp_table;
 Above set of operations throw the following error:
 
 Authorization failed:No privilege 'Drop' found for outputs { 
 database:default, table:temp_table}. Use SHOW GRANT to get more details.
 14/09/02 17:49:38 ERROR ql.Driver: Authorization failed:No privilege 'Drop' 
 found for outputs { database:default, table:temp_table}. Use SHOW GRANT to 
 get more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7508) Kerberos support for streaming

2014-09-03 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120448#comment-14120448
 ] 

Roshan Naik commented on HIVE-7508:
---

[~leftylev]. Yes Thanks for bringing it up. I will work with [~alangates] on 
updating that.

 Kerberos support for streaming
 --

 Key: HIVE-7508
 URL: https://issues.apache.org/jira/browse/HIVE-7508
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Roshan Naik
Assignee: Roshan Naik
  Labels: Streaming, TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7508.patch


 Add kerberos support for streaming to secure Hive cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7968) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2


 [ 
https://issues.apache.org/jira/browse/HIVE-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7968:
---
Fix Version/s: 0.14.0

 Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2
 --

 Key: HIVE-7968
 URL: https://issues.apache.org/jira/browse/HIVE-7968
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0


 MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7968) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2

Vaibhav Gumashta created HIVE-7968:
--

 Summary: Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2
 Key: HIVE-7968
 URL: https://issues.apache.org/jira/browse/HIVE-7968
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta


MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7580) Support dynamic partitioning [Spark Branch]

2014-09-03 Thread Chinna Rao Lalam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7580:
---
Attachment: HIVE-7580.patch

Patch contains passed test cases.

 Support dynamic partitioning [Spark Branch]
 ---

 Key: HIVE-7580
 URL: https://issues.apache.org/jira/browse/HIVE-7580
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
  Labels: Spark-M1
 Attachments: HIVE-7580.patch


 My understanding is that we don't need to do anything special for this. 
 However, this needs to be verified and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7968) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2


 [ 
https://issues.apache.org/jira/browse/HIVE-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7968:
---
Affects Version/s: 0.14.0

 Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2
 --

 Key: HIVE-7968
 URL: https://issues.apache.org/jira/browse/HIVE-7968
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0


 MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7968) Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2


 [ 
https://issues.apache.org/jira/browse/HIVE-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7968:
---
Description: MiniHS2 uses MiniMr. Makes no sense to have two test cases for 
same setup when JDBC is the client api for HS2.  (was: MiniHS2 uses MiniMr. 
Makes no sense to have two test cases for same setup.)

 Merge tests in TestJdbcWithMiniMr with TestJdbcWithMiniHS2
 --

 Key: HIVE-7968
 URL: https://issues.apache.org/jira/browse/HIVE-7968
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0


 MiniHS2 uses MiniMr. Makes no sense to have two test cases for same setup 
 when JDBC is the client api for HS2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7811) Compactions need to update table/partition stats

2014-09-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7811:
-
Status: Patch Available  (was: Open)

 Compactions need to update table/partition stats
 

 Key: HIVE-7811
 URL: https://issues.apache.org/jira/browse/HIVE-7811
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-7811.3.patch, HIVE-7811.4.patch, HIVE-7811.5.patch, 
 HIVE-7811.6.patch


 Compactions should trigger stats recalculation for columns which already have 
 sats.
 https://reviews.apache.org/r/25201/
 Major compactions will cause the Compactor to see which columns already have 
 stats and run analyze command for those columns.  If compacting a partition 
 then stats for that partition will be computed.  If table is not partitioned, 
 then the whole table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7208) move SearchArgument interface into serde package

2014-09-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7208:
---
Attachment: HIVE-7208.03.patch

File deletion has not been rebased properly

 move SearchArgument interface into serde package
 

 Key: HIVE-7208
 URL: https://issues.apache.org/jira/browse/HIVE-7208
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-7208.01.patch, HIVE-7208.02.patch, 
 HIVE-7208.03.patch, HIVE-7208.patch


 For usage in alternative input formats/serdes, it might be useful to move 
 SearchArgument class to a place that is not in ql (because it's hard to 
 depend on ql).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7956) When inserting into a bucketed table, all data goes to a single bucket [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120484#comment-14120484
 ] 

Brock Noland commented on HIVE-7956:


I thought that MR does this by setting the number of reducers equal to the 
number of buckets.

 When inserting into a bucketed table, all data goes to a single bucket [Spark 
 Branch]
 -

 Key: HIVE-7956
 URL: https://issues.apache.org/jira/browse/HIVE-7956
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li

 I created a bucketed table:
 {code}
 create table testBucket(x int,y string) clustered by(x) into 10 buckets;
 {code}
 Then I run a query like:
 {code}
 set hive.enforce.bucketing = true;
 insert overwrite table testBucket select intCol,stringCol from src;
 {code}
 Here {{src}} is a simple textfile-based table containing 4000 records 
 (not bucketed). The query launches 10 reduce tasks but all the data goes to 
 only one of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6948) HiveServer2 doesn't respect HIVE_AUX_JARS_PATH


[ 
https://issues.apache.org/jira/browse/HIVE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120490#comment-14120490
 ] 

Brock Noland commented on HIVE-6948:


This is a dup of  HIVE-6820.

 HiveServer2 doesn't respect HIVE_AUX_JARS_PATH
 --

 Key: HIVE-6948
 URL: https://issues.apache.org/jira/browse/HIVE-6948
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Peng Zhang
 Fix For: 0.14.0

 Attachments: HIVE-6948.patch, HIVE-6948.patch


 HiveServer2 ignores HIVE_AUX_JARS_PATH.
 This will cause aux jars not distributed to Yarn cluster, and job will fail 
 without dependent jars.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120495#comment-14120495
 ] 

Gunther Hagleitner commented on HIVE-7826:
--

Thanks [~damien.carol]. Your last comment definitely made my day :-)

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14, tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
 HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate
 Add  these properties :
 ||Property||Default Value||
 |{{hive.tez.dynamic.partition.pruning}}|true|
 |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
 |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7969) Use Optiq's native FieldTrimmer instead of HiveRelFieldTrimmer

Ashutosh Chauhan created HIVE-7969:
--

 Summary: Use Optiq's native FieldTrimmer instead of 
HiveRelFieldTrimmer
 Key: HIVE-7969
 URL: https://issues.apache.org/jira/browse/HIVE-7969
 Project: Hive
  Issue Type: Sub-task
  Components: CBO, Logical Optimizer
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


After patch series of OPTIQ-391 OPTIQ-392 OPTIQ-395 OPTIQ-396 its now possible 
to use Optiq's native FieldTrimmer. So, lets use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6948) HiveServer2 doesn't respect HIVE_AUX_JARS_PATH