[jira] [Commented] (HIVE-7237) hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040459#comment-14040459
 ] 

Hive QA commented on HIVE-7237:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651909/HIVE-7237.2.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5669 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/556/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/556/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-556/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651909

 hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever
 -

 Key: HIVE-7237
 URL: https://issues.apache.org/jira/browse/HIVE-7237
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.13.0
 Environment: HDP 2.1, Hive 0.13, SLES 11, 128GB data nodes, ORC SNAPPY
Reporter: Douglas Moore
Assignee: Navis
 Attachments: HIVE-7237.1.patch.txt, HIVE-7237.2.patch.txt


 set hive.exec.parallel=true; will cause the Yarn application instance to 
 linger
 forever. set hive.exec.parallel=false, the application goes away as soon as 
 hive query is complete. The underlying table is an ORC store_sales table 
 compressed with SNAPPY.
 {code}
 hive.exec.parallel=true;
 select * from store_sales where ss_ticket_number=5741230 and ss_item_sk=4825
 {code}
 The query will run under Tez and finish  30 seconds.
 After 30-40 of these jobs the cluster gets to a point where no jobs will 
 finish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()

2014-06-23 Thread DJ Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DJ Choi updated HIVE-7172:
--

Attachment: HIVE-7172.patch

 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
 -

 Key: HIVE-7172
 URL: https://issues.apache.org/jira/browse/HIVE-7172
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7172.patch


 {code}
   ResultSet res = stmt.executeQuery(versionQuery);
   if (!res.next()) {
 throw new HiveMetaException(Didn't find version data in metastore);
   }
   String currentSchemaVersion = res.getString(1);
   metastoreConn.close();
 {code}
 When HiveMetaException is thrown, metastoreConn.close() would be skipped.
 stmt is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()

2014-06-23 Thread DJ Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DJ Choi updated HIVE-7172:
--

Attachment: (was: HIVE-7172.patch)

 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
 -

 Key: HIVE-7172
 URL: https://issues.apache.org/jira/browse/HIVE-7172
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7172.patch


 {code}
   ResultSet res = stmt.executeQuery(versionQuery);
   if (!res.next()) {
 throw new HiveMetaException(Didn't find version data in metastore);
   }
   String currentSchemaVersion = res.getString(1);
   metastoreConn.close();
 {code}
 When HiveMetaException is thrown, metastoreConn.close() would be skipped.
 stmt is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()

2014-06-23 Thread KangHS (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KangHS updated HIVE-7229:
-

Attachment: (was: HIVE-7229)

 String is compared using equal in HiveMetaStore#HMSHandler#init()
 -

 Key: HIVE-7229
 URL: https://issues.apache.org/jira/browse/HIVE-7229
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor

 Around line 423:
 {code}
   if (partitionValidationRegex != null  partitionValidationRegex != ) 
 {
 partitionValidationPattern = 
 Pattern.compile(partitionValidationRegex);
 {code}
 partitionValidationRegex.isEmpty() can be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()

2014-06-23 Thread KangHS (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KangHS updated HIVE-7229:
-

Attachment: HIVE-7229.patch

The String compare opertation Change

 String is compared using equal in HiveMetaStore#HMSHandler#init()
 -

 Key: HIVE-7229
 URL: https://issues.apache.org/jira/browse/HIVE-7229
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7229.patch


 Around line 423:
 {code}
   if (partitionValidationRegex != null  partitionValidationRegex != ) 
 {
 partitionValidationPattern = 
 Pattern.compile(partitionValidationRegex);
 {code}
 partitionValidationRegex.isEmpty() can be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-5074) Additional information for mini-mr tests

2014-06-23 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis resolved HIVE-5074.
-

Resolution: Won't Fix

Seemed enough to look into hive.log

 Additional information for mini-mr tests
 

 Key: HIVE-5074
 URL: https://issues.apache.org/jira/browse/HIVE-5074
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Trivial

 Flaky tests of Test(Negative)MinimrCliDriver is hard to track. Test results 
 for diff error and exception trace for unexpected exception would be helpful 
 for debugging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread Lefty Leverenz
Bravo, Szehon and Gopal!

-- Lefty


On Mon, Jun 23, 2014 at 12:53 AM, Gopal V gop...@apache.org wrote:

 On 6/22/14, 8:42 PM, Carl Steinbach wrote:

 The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho
 committers on the Apache Hive Project.


 Thanks everyone! And congrats Szehon!

 Cheers,
 Gopal



[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()

2014-06-23 Thread DJ Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DJ Choi updated HIVE-7172:
--

Attachment: HIVE-7172.patch

 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
 -

 Key: HIVE-7172
 URL: https://issues.apache.org/jira/browse/HIVE-7172
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7172.patch


 {code}
   ResultSet res = stmt.executeQuery(versionQuery);
   if (!res.next()) {
 throw new HiveMetaException(Didn't find version data in metastore);
   }
   String currentSchemaVersion = res.getString(1);
   metastoreConn.close();
 {code}
 When HiveMetaException is thrown, metastoreConn.close() would be skipped.
 stmt is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()

2014-06-23 Thread DJ Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DJ Choi updated HIVE-7172:
--

Attachment: (was: HIVE-7172.patch)

 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
 -

 Key: HIVE-7172
 URL: https://issues.apache.org/jira/browse/HIVE-7172
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7172.patch


 {code}
   ResultSet res = stmt.executeQuery(versionQuery);
   if (!res.next()) {
 throw new HiveMetaException(Didn't find version data in metastore);
   }
   String currentSchemaVersion = res.getString(1);
   metastoreConn.close();
 {code}
 When HiveMetaException is thrown, metastoreConn.close() would be skipped.
 stmt is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7274) Update PTest2 to JClouds 1.7.3

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040511#comment-14040511
 ] 

Hive QA commented on HIVE-7274:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651928/HIVE-7274.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5669 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/557/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/557/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-557/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651928

 Update PTest2 to JClouds 1.7.3
 --

 Key: HIVE-7274
 URL: https://issues.apache.org/jira/browse/HIVE-7274
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7274.patch


 Required to use newer instance types



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT

2014-06-23 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2597:


Status: Patch Available  (was: Open)

 Repeated key in GROUP BY is erroneously displayed when using DISTINCT
 -

 Key: HIVE-2597
 URL: https://issues.apache.org/jira/browse/HIVE-2597
 Project: Hive
  Issue Type: Bug
Reporter: Alex Rovner
Assignee: Navis
 Attachments: HIVE-2597.3.patch.txt, HIVE-2597.D8967.1.patch, 
 HIVE-2597.D8967.2.patch


 The following query was simplified for illustration purposes. 
 This works correctly:
 select client_tid,  as myvalue1,  as myvalue2 from clients cluster by 
 client_tid
 The intent here is to produce two empty columns in between data.
 The following query does not work:
 select distinct client_tid,  as myvalue1,  as myvalue2 from clients 
 cluster by client_tid
 FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY 
 The key is not repeated since the aliases were given. Seems like Hive is 
 ignoring the aliases when the distinct keyword is specified.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT

2014-06-23 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2597:


Attachment: HIVE-2597.3.patch.txt

 Repeated key in GROUP BY is erroneously displayed when using DISTINCT
 -

 Key: HIVE-2597
 URL: https://issues.apache.org/jira/browse/HIVE-2597
 Project: Hive
  Issue Type: Bug
Reporter: Alex Rovner
Assignee: Navis
 Attachments: HIVE-2597.3.patch.txt, HIVE-2597.D8967.1.patch, 
 HIVE-2597.D8967.2.patch


 The following query was simplified for illustration purposes. 
 This works correctly:
 select client_tid,  as myvalue1,  as myvalue2 from clients cluster by 
 client_tid
 The intent here is to produce two empty columns in between data.
 The following query does not work:
 select distinct client_tid,  as myvalue1,  as myvalue2 from clients 
 cluster by client_tid
 FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY 
 The key is not repeated since the aliases were given. Seems like Hive is 
 ignoring the aliases when the distinct keyword is specified.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7273) Hive job fails due to unable to rename reducer output file

2014-06-23 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040537#comment-14040537
 ] 

Navis commented on HIVE-7273:
-

The directory is expected to be created in ExecDriver before job submitting. 
Could you provide a query which can reproduce the situation? Fail of other 
tasks in the query also can make exceptions like above.

 Hive job fails due to unable to rename reducer output file
 --

 Key: HIVE-7273
 URL: https://issues.apache.org/jira/browse/HIVE-7273
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: George Wong

 We ran into this issue in our cluster.
 The error message is like this
 {noformat}
 org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output 
 from: 
 hdfs://***:8020/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_task_tmp.-ext-10002/_tmp.00_3
  to: 
 hdfs://***:8020/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_tmp.-ext-10002/00_3
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:470)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:407)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
 {noformat}
 The log of NameNode shows
 {noformat}
 2014-06-16 20:43:38,582 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.unprotectedRenameTo: failed to rename 
 /tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_task_tmp.-ext-10002/_tmp.00_3
  to 
 /tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_tmp.-ext-10002/00_3
  because destination's parent does not exist
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION

2014-06-23 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7051:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
 -

 Key: HIVE-7051
 URL: https://issues.apache.org/jira/browse/HIVE-7051
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7051.1.patch


 Same as HIVE-7050 but for partitions



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7271) Speed up unit tests

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040550#comment-14040550
 ] 

Hive QA commented on HIVE-7271:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651931/HIVE-7271.5.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5669 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dyn_part3
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/558/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/558/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-558/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651931

 Speed up unit tests
 ---

 Key: HIVE-7271
 URL: https://issues.apache.org/jira/browse/HIVE-7271
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, 
 HIVE-7271.4.patch, HIVE-7271.5.patch


 Did some experiments to see if there's a way to speed up unit tests. 
 TestCliDriver seemed to take a lot of time just spinning up/tearing down 
 JVMs. I was also curious to see if running everything on a ram disk would 
 help.
 Results (I ran tests up to authorization_2):
 - Current setup: 40 minutes
 - Single JVM (not using child JVM to run all queries): 8 minutes
 - Single JVM + ram disk: 7 minutes
 So the ram disk didn't help that much. But running tests in single JVM seems 
 worthwhile doing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7194) authorization_ctas.q failing on trunk

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040557#comment-14040557
 ] 

Ashutosh Chauhan commented on HIVE-7194:


+1 LGTM

 authorization_ctas.q failing on trunk
 -

 Key: HIVE-7194
 URL: https://issues.apache.org/jira/browse/HIVE-7194
 Project: Hive
  Issue Type: Task
  Components: Authorization
Reporter: Ashutosh Chauhan
Assignee: Thejas M Nair
 Attachments: HIVE-7194.1.patch.txt, HIVE-7194.patch


 Need to update .q.out file



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7265) BINARY columns use BytesWritable::getBytes() without ::getLength()

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040560#comment-14040560
 ] 

Ashutosh Chauhan commented on HIVE-7265:


+1

 BINARY columns use BytesWritable::getBytes() without ::getLength()
 --

 Key: HIVE-7265
 URL: https://issues.apache.org/jira/browse/HIVE-7265
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7265.1.patch.txt


 The Text conversion for BINARY columns does 
 {code}
 case BINARY:
 t.set(((BinaryObjectInspector) 
 inputOI).getPrimitiveWritableObject(input).getBytes());
 return t;
 {code}
 This omission was noticed while investigating a different String related bug, 
 in a list of functions which call getBytes() without calling 
 getSize/getLength().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-06-23 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7205:
---

Status: Open  (was: Patch Available)

 Wrong results when union all of grouping followed by group by with 
 correlation optimization
 ---

 Key: HIVE-7205
 URL: https://issues.apache.org/jira/browse/HIVE-7205
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: dima machlin
Assignee: Navis
Priority: Critical
 Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt


 use case :
 table TBL (a string,b string) contains single row : 'a','a'
 the following query :
 {code:sql}
 select b, sum(cc) from (
 select b,count(1) as cc from TBL group by b
 union all
 select a as b,count(1) as cc from TBL group by a
 ) z
 group by b
 {code}
 returns 
 a 1
 a 1
 while set hive.optimize.correlation=true;
 if we change set hive.optimize.correlation=false;
 it returns correct results : a 2
 The plan with correlation optimization :
 {code:sql}
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
 (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
 (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
 (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
 (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
 (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL 
 a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
 (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
 (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 null-subquery1:z-subquery1:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: b
 type: string
   outputColumnNames: b
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: b
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 0
   value expressions:
 expr: _col1
 type: bigint
 null-subquery2:z-subquery2:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: a
 type: string
   outputColumnNames: a
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: a
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 1
   value expressions:
 expr: _col1
 type: bigint
   Reduce Operator Tree:
 Demux Operator
   Group By Operator
 aggregations:
   expr: count(VALUE._col0)
 bucketGroup: false
 keys:
   expr: KEY._col0
   type: string
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
 expr: _col0
 type: string
 expr: _col1
 type: bigint
   outputColumnNames: _col0, _col1
   Union
 Select Operator
   expressions:
 expr: _col0
 type: string
 expr: _col1
 

[jira] [Commented] (HIVE-7232) VectorReduceSink is emitting incorrect JOIN keys

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040565#comment-14040565
 ] 

Ashutosh Chauhan commented on HIVE-7232:


[~gopalv] Do you want to review this one?

 VectorReduceSink is emitting incorrect JOIN keys
 

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-7232-extra-logging.patch, HIVE-7232.1.patch.txt, 
 q5.explain.txt, q5.sql


 After HIVE-7121, tpc-h query5 has resulted in incorrect results.
 Thanks to [~navis], it has been tracked down to the auto-parallel settings 
 which were initialized for ReduceSinkOperator, but not for 
 VectorReduceSinkOperator. The vector version inherits, but doesn't call 
 super.initializeOp() or set up the variable correctly from ReduceSinkDesc.
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7237) hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040568#comment-14040568
 ] 

Ashutosh Chauhan commented on HIVE-7237:


+1

 hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever
 -

 Key: HIVE-7237
 URL: https://issues.apache.org/jira/browse/HIVE-7237
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.13.0
 Environment: HDP 2.1, Hive 0.13, SLES 11, 128GB data nodes, ORC SNAPPY
Reporter: Douglas Moore
Assignee: Navis
 Attachments: HIVE-7237.1.patch.txt, HIVE-7237.2.patch.txt


 set hive.exec.parallel=true; will cause the Yarn application instance to 
 linger
 forever. set hive.exec.parallel=false, the application goes away as soon as 
 hive query is complete. The underlying table is an ORC store_sales table 
 compressed with SNAPPY.
 {code}
 hive.exec.parallel=true;
 select * from store_sales where ss_ticket_number=5741230 and ss_item_sk=4825
 {code}
 The query will run under Tez and finish  30 seconds.
 After 30-40 of these jobs the cluster gets to a point where no jobs will 
 finish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7235) TABLESAMPLE on join table is regarded as alias

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040570#comment-14040570
 ] 

Ashutosh Chauhan commented on HIVE-7235:


[~rhbutani] Can you take a look at this one?

 TABLESAMPLE on join table is regarded as alias
 --

 Key: HIVE-7235
 URL: https://issues.apache.org/jira/browse/HIVE-7235
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7235.1.patch.txt


 {noformat}
 SELECT c_custkey, o_custkey
 FROM customer tablesample (1000 ROWS) join orders tablesample (1000 ROWS) on 
 c_custkey = o_custkey;
 {noformat}
 Fails with NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040579#comment-14040579
 ] 

Ashutosh Chauhan commented on HIVE-3392:


I am fine with reopening this. [~appodictic] What do you think ?

 Hive unnecessarily validates table SerDes when dropping a table
 ---

 Key: HIVE-3392
 URL: https://issues.apache.org/jira/browse/HIVE-3392
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Jonathan Natkins
Assignee: Navis
  Labels: patch
 Attachments: HIVE-3392.2.patch.txt, HIVE-3392.3.patch.txt, 
 HIVE-3392.Test Case - with_trunk_version.txt


 natty@hadoop1:~$ hive
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive create table test (a int) row format serde 'hive.serde.JSONSerDe';  
   
 OK
 Time taken: 2.399 seconds
 natty@hadoop1:~$ hive
 hive drop table test;

 FAILED: Hive Internal Error: 
 java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
  SerDe hive.serde.JSONSerDe does not exist))
 java.lang.RuntimeException: 
 MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe 
 hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
 SerDe com.cloudera.hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
   ... 20 more
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive drop table test;
 OK
 Time taken: 0.658 seconds
 hive 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7265) BINARY columns use BytesWritable::getBytes() without ::getLength()

2014-06-23 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7265:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 BINARY columns use BytesWritable::getBytes() without ::getLength()
 --

 Key: HIVE-7265
 URL: https://issues.apache.org/jira/browse/HIVE-7265
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7265.1.patch.txt


 The Text conversion for BINARY columns does 
 {code}
 case BINARY:
 t.set(((BinaryObjectInspector) 
 inputOI).getPrimitiveWritableObject(input).getBytes());
 return t;
 {code}
 This omission was noticed while investigating a different String related bug, 
 in a list of functions which call getBytes() without calling 
 getSize/getLength().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-23 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7094:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks David!

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Fix For: 0.14.0

 Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch, HIVE-7094.4.patch, 
 HIVE-7094.5.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040686#comment-14040686
 ] 

Hive QA commented on HIVE-7159:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651930/HIVE-7159.11.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5654 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/559/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/559/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-559/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651930

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.10.patch, 
 HIVE-7159.11.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, 
 HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, 
 HIVE-7159.9.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: branch for cbo work

2014-06-23 Thread Xuefu Zhang
Hi Ashutosh,

Thanks for your information. I do have some questions, but my concern is
more on the design doc than branching. Nevertheless, I think it would be
very helpful to clarify in the design before we actually put a lot of
effort.

From the design doc, it seems that the cost estimation is based on Tez,
while the optimization occurs on logical layer. I'd think that CBO are
valuable to either engine. If there is anything that's specific to a
particular to an engine, then that optimization should stay at engine layer.

My original comments was posted on HIVE-5775. Please let me know your
thoughts. I'd also like to hear from the community.

https://issues.apache.org/jira/browse/HIVE-5775?focusedCommentId=14039987page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039987

Thanks,
Xuefu


On Thu, Jun 19, 2014 at 10:34 PM, Ashutosh Chauhan hashut...@apache.org
wrote:

 Hi all,

 Some of you may have noticed that cost based optimizer work is going on at
 HIVE-5775 John has put up an initial patch there as well. But there is a
 lot more work that needs to be done. Following our tradition of large
 feature work in branch, I propose that we create a branch and commit this
 patch in it and than continue to work on it in branch to improve it.
 Hopefully, we can get it in shape so that we can merge it in trunk once its
 ready.  Unless, I hear otherwise I plan to create a branch and commit this
 initial patch by early next week.


 Design doc is located here :

 https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive


 Thanks,

 Ashutosh



[jira] [Commented] (HIVE-7270) SerDe Properties are not considered by show create table Command

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040784#comment-14040784
 ] 

Hive QA commented on HIVE-7270:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651932/HIVE-7270.1.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5669 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/564/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/564/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-564/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651932

 SerDe Properties are not considered by show create table Command
 

 Key: HIVE-7270
 URL: https://issues.apache.org/jira/browse/HIVE-7270
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.1
Reporter: Renil J
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7270.1.patch.txt


 The HIVE table DDl generated by show create table target_table command does 
 not contain SerDe properties of the target table even though it contain 
 specific SerDe properties. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040788#comment-14040788
 ] 

Hive QA commented on HIVE-7229:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651937/HIVE-7229.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/566/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/566/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-566/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-566/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1604814.

At revision 1604814.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651937

 String is compared using equal in HiveMetaStore#HMSHandler#init()
 -

 Key: HIVE-7229
 URL: https://issues.apache.org/jira/browse/HIVE-7229
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7229.patch


 Around line 423:
 {code}
   if (partitionValidationRegex != null  partitionValidationRegex != ) 
 {
 partitionValidationPattern = 
 Pattern.compile(partitionValidationRegex);
 {code}
 partitionValidationRegex.isEmpty() can be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040786#comment-14040786
 ] 

Hive QA commented on HIVE-7172:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651942/HIVE-7172.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/565/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/565/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-565/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-565/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/test/results/clientpositive/show_create_table_serde.q.out'
Reverted 'ql/src/test/queries/clientpositive/show_create_table_serde.q'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-minikdc/target itests/hive-unit/target 
itests/custom-serde/target itests/util/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1604814.

At revision 1604814.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651942

 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
 -

 Key: HIVE-7172
 URL: https://issues.apache.org/jira/browse/HIVE-7172
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7172.patch


 {code}
   ResultSet res = stmt.executeQuery(versionQuery);
   if (!res.next()) {
 throw new HiveMetaException(Didn't find version data in metastore);
   }
   String currentSchemaVersion = res.getString(1);
   metastoreConn.close();
 {code}
 When HiveMetaException is thrown, metastoreConn.close() would be skipped.
 stmt is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table

2014-06-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040802#comment-14040802
 ] 

Edward Capriolo commented on HIVE-3392:
---

Please feel free to take over the review. I will not have any time at the 
moment. Thanks!

 Hive unnecessarily validates table SerDes when dropping a table
 ---

 Key: HIVE-3392
 URL: https://issues.apache.org/jira/browse/HIVE-3392
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Jonathan Natkins
Assignee: Navis
  Labels: patch
 Attachments: HIVE-3392.2.patch.txt, HIVE-3392.3.patch.txt, 
 HIVE-3392.Test Case - with_trunk_version.txt


 natty@hadoop1:~$ hive
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive create table test (a int) row format serde 'hive.serde.JSONSerDe';  
   
 OK
 Time taken: 2.399 seconds
 natty@hadoop1:~$ hive
 hive drop table test;

 FAILED: Hive Internal Error: 
 java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
  SerDe hive.serde.JSONSerDe does not exist))
 java.lang.RuntimeException: 
 MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe 
 hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
 SerDe com.cloudera.hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
   ... 20 more
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive drop table test;
 OK
 Time taken: 0.658 seconds
 hive 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread kulkarni.swar...@gmail.com
Congratulations guys!


On Mon, Jun 23, 2014 at 2:09 AM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Bravo, Szehon and Gopal!

 -- Lefty


 On Mon, Jun 23, 2014 at 12:53 AM, Gopal V gop...@apache.org wrote:

  On 6/22/14, 8:42 PM, Carl Steinbach wrote:
 
  The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho
  committers on the Apache Hive Project.
 
 
  Thanks everyone! And congrats Szehon!
 
  Cheers,
  Gopal
 




-- 
Swarnim


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread Szehon Ho
Thank you all very much, and congrats Gopal!
Szehon


On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote:

 The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho
 committers on the Apache Hive Project.

 Please join me in congratulating Gopal and Szehon!

 Thanks.

 - Carl



[jira] [Commented] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040900#comment-14040900
 ] 

Hive QA commented on HIVE-2597:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651945/HIVE-2597.3.patch.txt

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 5670 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join1
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join2
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join3
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join5
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join6
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join7
org.apache.hadoop.hive.ql.parse.TestParse.testParse_join8
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/568/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/568/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-568/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651945

 Repeated key in GROUP BY is erroneously displayed when using DISTINCT
 -

 Key: HIVE-2597
 URL: https://issues.apache.org/jira/browse/HIVE-2597
 Project: Hive
  Issue Type: Bug
Reporter: Alex Rovner
Assignee: Navis
 Attachments: HIVE-2597.3.patch.txt, HIVE-2597.D8967.1.patch, 
 HIVE-2597.D8967.2.patch


 The following query was simplified for illustration purposes. 
 This works correctly:
 select client_tid,  as myvalue1,  as myvalue2 from clients cluster by 
 client_tid
 The intent here is to produce two empty columns in between data.
 The following query does not work:
 select distinct client_tid,  as myvalue1,  as myvalue2 from clients 
 cluster by client_tid
 FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY 
 The key is not repeated since the aliases were given. Seems like Hive is 
 ignoring the aliases when the distinct keyword is specified.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7118) Oracle upgrade schema scripts do not map Java long datatype columns correctly for transaction related tables

2014-06-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040929#comment-14040929
 ] 

Alan Gates commented on HIVE-7118:
--

+1

 Oracle upgrade schema scripts do not map Java long datatype columns correctly 
 for transaction related tables
 

 Key: HIVE-7118
 URL: https://issues.apache.org/jira/browse/HIVE-7118
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
 Environment: Oracle DB
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-7118-0.13.0.1.patch, HIVE-7118.1.patch


 In Transaction related tables, Java long column fields are mapped to 
 NUMBER(10) which results in failure to persist the transaction ids which are 
 incompatible. Following error is seen:
 {noformat}
 ORA-01438: value larger than specified precision allowed for this column
 {noformat}
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread Vaibhav Gumashta
Congrats Gopal and Szehon!

--Vaibhav


On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote:

 Thank you all very much, and congrats Gopal!
 Szehon


 On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote:

  The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho
  committers on the Apache Hive Project.
 
  Please join me in congratulating Gopal and Szehon!
 
  Thanks.
 
  - Carl
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread Xiaobing Zhou
Congrats!



On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta vgumas...@hortonworks.com
 wrote:

 Congrats Gopal and Szehon!

 --Vaibhav


 On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote:

  Thank you all very much, and congrats Gopal!
  Szehon
 
 
  On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org wrote:
 
   The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon
 Ho
   committers on the Apache Hive Project.
  
   Please join me in congratulating Gopal and Szehon!
  
   Thanks.
  
   - Carl
  
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040970#comment-14040970
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

The cost model as described in the doc assumes TEZ as the execution layer.


 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040986#comment-14040986
 ] 

Gopal V commented on HIVE-5775:
---

[~xuefuz]: The CBO model rewrites queries using cardinality statistics.

The tuple count and distinct value count should not affect which physical layer 
it runs on - having the CBO split up/reorder a 3-way map-join into 2 phases (or 
vertices) should generate identical plans in both.

MR would run 2 Map-only phases with their own local tasks and hashtable 
uploads, Tez would run 2 vertices with their own broadcast tasks.

Tez can reduce runtimes further by removing the intermediate IO cost  
co-schedule the second vertex in the same container as the first - but that is 
not assumed as it is not a strong guarantee in a busy cluster.

The Tez runtime model is faster, but the logical cost does not change as the 
number of rows read off disk, written to disk and distinct keys remain the same.

In fact as it exists today, because it applies equally to both Tez  MR, it 
ignores a lot of Tez's opportunistic/runtime optimizations like container-reuse 
- e.g. Each vertex in Tez is a different process.

It is upto the Tez DAG planner to attend to such runtime optimization details.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread Hari Subramaniyan
congrats to Gopal and Szehon!

Thanks
Hari


On Mon, Jun 23, 2014 at 9:59 AM, Xiaobing Zhou xz...@hortonworks.com
wrote:

 Congrats!



 On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta 
 vgumas...@hortonworks.com
  wrote:

  Congrats Gopal and Szehon!
 
  --Vaibhav
 
 
  On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote:
 
   Thank you all very much, and congrats Gopal!
   Szehon
  
  
   On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org
 wrote:
  
The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon
  Ho
committers on the Apache Hive Project.
   
Please join me in congratulating Gopal and Szehon!
   
Thanks.
   
- Carl
   
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread Jason Dere
Congrats!

On Jun 23, 2014, at 10:28 AM, Hari Subramaniyan hsubramani...@hortonworks.com 
wrote:

 congrats to Gopal and Szehon!
 
 Thanks
 Hari
 
 
 On Mon, Jun 23, 2014 at 9:59 AM, Xiaobing Zhou xz...@hortonworks.com
 wrote:
 
 Congrats!
 
 
 
 On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta 
 vgumas...@hortonworks.com
 wrote:
 
 Congrats Gopal and Szehon!
 
 --Vaibhav
 
 
 On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com wrote:
 
 Thank you all very much, and congrats Gopal!
 Szehon
 
 
 On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org
 wrote:
 
 The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon
 Ho
 committers on the Apache Hive Project.
 
 Please join me in congratulating Gopal and Szehon!
 
 Thanks.
 
 - Carl
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread Vikram Dixit
Congrats Gopal and Szehon!


On Mon, Jun 23, 2014 at 10:34 AM, Jason Dere jd...@hortonworks.com wrote:

 Congrats!

 On Jun 23, 2014, at 10:28 AM, Hari Subramaniyan 
 hsubramani...@hortonworks.com wrote:

  congrats to Gopal and Szehon!
 
  Thanks
  Hari
 
 
  On Mon, Jun 23, 2014 at 9:59 AM, Xiaobing Zhou xz...@hortonworks.com
  wrote:
 
  Congrats!
 
 
 
  On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta 
  vgumas...@hortonworks.com
  wrote:
 
  Congrats Gopal and Szehon!
 
  --Vaibhav
 
 
  On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho sze...@cloudera.com
 wrote:
 
  Thank you all very much, and congrats Gopal!
  Szehon
 
 
  On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach c...@apache.org
  wrote:
 
  The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon
  Ho
  committers on the Apache Hive Project.
 
  Please join me in congratulating Gopal and Szehon!
 
  Thanks.
 
  - Carl
 
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity
  to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
  that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
  immediately
  and delete it from your system. Thank You.
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: branch for cbo work

2014-06-23 Thread John Pullokkaran
Following may help in reducing the confusion:

1. In design doc the cost formula is for choosing Join Algorithm. The cost
formula as described in the doc assumes Tez execution.

2. However current work on CBO doesn’t include Join algorithm selection.
Instead it rearranges Join based on Join cardinality  NDV. In other words
Join reordering is not depended on Physical Execution Layer (Tez or MR).

3. When we decide to do Join Algorithm Selection we can fit in cost formula
for both a) MR b) Tez. This way, based on the physical execution layer we
can select best Join Algorithm/Order.

4. The cost formula for Join Algorithm selection is not that different
between MR  Tez (except for intermediate HDFS writes). So assume that CBO
can support both execution layers rather easily.

5. CBO framework allows you to plug and play any cost model. There is no
hard coupling.

Thanks
John



On Mon, Jun 23, 2014 at 7:09 AM, Xuefu Zhang xzh...@cloudera.com wrote:

 Hi Ashutosh,

 Thanks for your information. I do have some questions, but my concern is
 more on the design doc than branching. Nevertheless, I think it would be
 very helpful to clarify in the design before we actually put a lot of
 effort.

 From the design doc, it seems that the cost estimation is based on Tez,
 while the optimization occurs on logical layer. I'd think that CBO are
 valuable to either engine. If there is anything that's specific to a
 particular to an engine, then that optimization should stay at engine
 layer.

 My original comments was posted on HIVE-5775. Please let me know your
 thoughts. I'd also like to hear from the community.


 https://issues.apache.org/jira/browse/HIVE-5775?focusedCommentId=14039987page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039987

 Thanks,
 Xuefu


 On Thu, Jun 19, 2014 at 10:34 PM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  Hi all,
 
  Some of you may have noticed that cost based optimizer work is going on
 at
  HIVE-5775 John has put up an initial patch there as well. But there is a
  lot more work that needs to be done. Following our tradition of large
  feature work in branch, I propose that we create a branch and commit this
  patch in it and than continue to work on it in branch to improve it.
  Hopefully, we can get it in shape so that we can merge it in trunk once
 its
  ready.  Unless, I hear otherwise I plan to create a branch and commit
 this
  initial patch by early next week.
 
 
  Design doc is located here :
 
 
 https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive
 
 
  Thanks,
 
  Ashutosh
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041053#comment-14041053
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

Following may help in reducing the confusion:

1. In design doc the cost formula is for choosing Join Algorithm. The cost 
formula as described in the doc assumes Tez execution.

2. However current work on CBO doesn’t include Join algorithm selection. 
Instead it rearranges Join based on Join cardinality  NDV. In other words Join 
reordering is not depended on Physical Execution Layer (Tez or MR).

3. When we decide to do Join Algorithm Selection we can fit in cost formula for 
both a) MR b) Tez. This way, based on the physical execution layer we can 
select best Join Algorithm/Order. 

4. The cost formula for Join Algorithm selection is not that different between 
MR  Tez (except for intermediate HDFS writes). So assume that CBO can support 
both execution layers rather easily.

5. CBO framework allows you to plug and play any cost model. There is no hard 
coupling.


 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041056#comment-14041056
 ] 

Xuefu Zhang commented on HIVE-5775:
---

Thanks for the clarification, [~gopalv]. We are in total agreement if what is 
put in the logical layer is the optimization that's applicable to either 
execution engine and if execution engine specific optimization is put in the 
execution layer. Maybe the document can be updated to make this explicit to 
avoid confusion/misunderstanding from others.

{quote}
The cost model as described in the doc assumes TEZ as the execution layer.
{quote}

Not sure if I understand [~jpullokkaran] correctly. If the cost model is based 
on Tez, then we shall only use a model that's common for both Tez and MR when 
rewriting the query, right?

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7241) Wrong lock acquired for alter table rename partition

2014-06-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041061#comment-14041061
 ] 

Alan Gates commented on HIVE-7241:
--

root_dir_external_table and authorization_ctas fail on trunk.  The other two 
pass in my local tests on both trunk and with my patch, so I do not believe any 
of these are related to the patch.

 Wrong lock acquired for alter table rename partition
 

 Key: HIVE-7241
 URL: https://issues.apache.org/jira/browse/HIVE-7241
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7241.patch, HIVE-7241.patch


 Doing an alter table foo partition (bar='x') rename to partition (bar='y') 
 acquires a read lock on table foo.  It should instead acquire an exclusive 
 lock on partition bar=x.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7274) Update PTest2 to JClouds 1.7.3

2014-06-23 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041075#comment-14041075
 ] 

Szehon Ho commented on HIVE-7274:
-

Thanks for researching that.  +1 

 Update PTest2 to JClouds 1.7.3
 --

 Key: HIVE-7274
 URL: https://issues.apache.org/jira/browse/HIVE-7274
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7274.patch


 Required to use newer instance types



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041085#comment-14041085
 ] 

Laljo John Pullokkaran commented on HIVE-5775:
--

Cost Model described doesn't apply to current CBO work and for the proposed 
branch.
It will apply only for Join Algorithm selection which is not part of the 
current work.

IMO moving join reordering to physical optimizer is the not the correct 
solution. I would rather leave it in logical, since after doing join reordering 
you may able to do other optimizations like, new predicate push down, 
transitive inferences….

When we get around to do Join Algorithm selection there will be two cost 
formulas one for MR and one for Tez.
I think best solution is to support both cost models and decide which one to 
apply based on physical execution layer.

I will update the doc. 

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: branch for cbo work

2014-06-23 Thread Xuefu Zhang
Thanks for the clarification. I'm happily on board with this as long as our
approach takes account of the differences between execution engines. While
MR and Tez might be similar, there could be new execution engines in the
future which might not be that similar. Ideally, all execution engines
should benefit from this effort yet room is kept to allow for specific
optimizations for a particular engine. It's great if we all see that.

Thanks,
Xuefu


On Mon, Jun 23, 2014 at 11:05 AM, John Pullokkaran 
jpullokka...@hortonworks.com wrote:

 Following may help in reducing the confusion:

 1. In design doc the cost formula is for choosing Join Algorithm. The cost
 formula as described in the doc assumes Tez execution.

 2. However current work on CBO doesn’t include Join algorithm selection.
 Instead it rearranges Join based on Join cardinality  NDV. In other words
 Join reordering is not depended on Physical Execution Layer (Tez or MR).

 3. When we decide to do Join Algorithm Selection we can fit in cost formula
 for both a) MR b) Tez. This way, based on the physical execution layer we
 can select best Join Algorithm/Order.

 4. The cost formula for Join Algorithm selection is not that different
 between MR  Tez (except for intermediate HDFS writes). So assume that CBO
 can support both execution layers rather easily.

 5. CBO framework allows you to plug and play any cost model. There is no
 hard coupling.

 Thanks
 John



 On Mon, Jun 23, 2014 at 7:09 AM, Xuefu Zhang xzh...@cloudera.com wrote:

  Hi Ashutosh,
 
  Thanks for your information. I do have some questions, but my concern is
  more on the design doc than branching. Nevertheless, I think it would be
  very helpful to clarify in the design before we actually put a lot of
  effort.
 
  From the design doc, it seems that the cost estimation is based on Tez,
  while the optimization occurs on logical layer. I'd think that CBO are
  valuable to either engine. If there is anything that's specific to a
  particular to an engine, then that optimization should stay at engine
  layer.
 
  My original comments was posted on HIVE-5775. Please let me know your
  thoughts. I'd also like to hear from the community.
 
 
 
 https://issues.apache.org/jira/browse/HIVE-5775?focusedCommentId=14039987page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039987
 
  Thanks,
  Xuefu
 
 
  On Thu, Jun 19, 2014 at 10:34 PM, Ashutosh Chauhan hashut...@apache.org
 
  wrote:
 
   Hi all,
  
   Some of you may have noticed that cost based optimizer work is going on
  at
   HIVE-5775 John has put up an initial patch there as well. But there is
 a
   lot more work that needs to be done. Following our tradition of large
   feature work in branch, I propose that we create a branch and commit
 this
   patch in it and than continue to work on it in branch to improve it.
   Hopefully, we can get it in shape so that we can merge it in trunk once
  its
   ready.  Unless, I hear otherwise I plan to create a branch and commit
  this
   initial patch by early next week.
  
  
   Design doc is located here :
  
  
 
 https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive
  
  
   Thanks,
  
   Ashutosh
  
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Commented] (HIVE-7242) alter table drop partition is acquiring the wrong type of lock

2014-06-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041101#comment-14041101
 ] 

Alan Gates commented on HIVE-7242:
--

dynpart_sort_optimization passes on both trunk and with the patch when I run 
it.  root_dir_external_table and authorization_ctas are broken on trunk right 
now.  So I don't believe these test failures are related to this patch.

 alter table drop partition is acquiring the wrong type of lock
 --

 Key: HIVE-7242
 URL: https://issues.apache.org/jira/browse/HIVE-7242
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.14.0

 Attachments: HIVE-7242.patch


 Doing an alter table foo drop partition ('bar=x') acquired a shared-write 
 lock on partition bar=x.  It should be acquiring an exclusive lock in that 
 case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: branch for cbo work

2014-06-23 Thread John Pullokkaran
I see that design doc doesn't talk about plug and play aspect of cost
model; and it also doesn't make it clear that cost model described is for
Join Algorithm selection; also it doesn't have cost model for MR.

I will update the doc appropriately.

Thanks
John

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-7246) Hive transaction manager hardwires bonecp as the JDBC pooling implementation

2014-06-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7246:
-

Status: Open  (was: Patch Available)

Patch no longer applies after recent checkins. 

 Hive transaction manager hardwires bonecp as the JDBC pooling implementation
 

 Key: HIVE-7246
 URL: https://issues.apache.org/jira/browse/HIVE-7246
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7246.patch


 Currently TxnManager hardwires BoneCP as the JDBC connection pooling 
 implementation.  Instead it should use the same connection pooling that the 
 metastore does.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5775) Introduce Cost Based Optimizer to Hive

2014-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041115#comment-14041115
 ] 

Xuefu Zhang commented on HIVE-5775:
---

Cool. Thanks for the clarifications.

 Introduce Cost Based Optimizer to Hive
 --

 Key: HIVE-5775
 URL: https://issues.apache.org/jira/browse/HIVE-5775
 Project: Hive
  Issue Type: New Feature
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: CBO-2.pdf, HIVE-5775.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7246) Hive transaction manager hardwires bonecp as the JDBC pooling implementation

2014-06-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7246:
-

Status: Patch Available  (was: Open)

 Hive transaction manager hardwires bonecp as the JDBC pooling implementation
 

 Key: HIVE-7246
 URL: https://issues.apache.org/jira/browse/HIVE-7246
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7246-2.patch, HIVE-7246.patch


 Currently TxnManager hardwires BoneCP as the JDBC connection pooling 
 implementation.  Instead it should use the same connection pooling that the 
 metastore does.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7246) Hive transaction manager hardwires bonecp as the JDBC pooling implementation

2014-06-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7246:
-

Attachment: HIVE-7246-2.patch

Rebased patch.

 Hive transaction manager hardwires bonecp as the JDBC pooling implementation
 

 Key: HIVE-7246
 URL: https://issues.apache.org/jira/browse/HIVE-7246
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7246-2.patch, HIVE-7246.patch


 Currently TxnManager hardwires BoneCP as the JDBC connection pooling 
 implementation.  Instead it should use the same connection pooling that the 
 metastore does.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()

2014-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041126#comment-14041126
 ] 

Xuefu Zhang commented on HIVE-7229:
---

[~HS] Thanks for working on this. It seemed that your patch didn't apply to 
trunk somehow. Could you check/rebase if necessary? On a side note, the 
following is equivalent:
{code}
partitionValidationRegex != null  !partitionValidationRegex.equals()
{code}
{code}
.equals(partitionValidationRegex)
{code}


 String is compared using equal in HiveMetaStore#HMSHandler#init()
 -

 Key: HIVE-7229
 URL: https://issues.apache.org/jira/browse/HIVE-7229
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7229.patch


 Around line 423:
 {code}
   if (partitionValidationRegex != null  partitionValidationRegex != ) 
 {
 partitionValidationPattern = 
 Pattern.compile(partitionValidationRegex);
 {code}
 partitionValidationRegex.isEmpty() can be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()

2014-06-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041132#comment-14041132
 ] 

Xuefu Zhang commented on HIVE-7229:
---

Never mind about above code snippet.

 String is compared using equal in HiveMetaStore#HMSHandler#init()
 -

 Key: HIVE-7229
 URL: https://issues.apache.org/jira/browse/HIVE-7229
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7229.patch


 Around line 423:
 {code}
   if (partitionValidationRegex != null  partitionValidationRegex != ) 
 {
 partitionValidationPattern = 
 Pattern.compile(partitionValidationRegex);
 {code}
 partitionValidationRegex.isEmpty() can be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7249) HiveTxnManager.closeTxnManger() throws if called after commitTxn()

2014-06-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041135#comment-14041135
 ] 

Alan Gates commented on HIVE-7249:
--

TestOrcDynamicPartitioned runs fine in my tests.  But I ran it as is.  Did you 
turn on the DbTxnManager and then run the test, or run it as is?

 HiveTxnManager.closeTxnManger() throws if called after commitTxn()
 --

 Key: HIVE-7249
 URL: https://issues.apache.org/jira/browse/HIVE-7249
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Alan Gates
 Attachments: HIVE-7249.patch


  I openTxn() and acquireLocks() for a query that looks like INSERT INTO T 
 PARTITION(p) SELECT * FROM T.
 Then I call commitTxn().  Then I call closeTxnManger() I get an exception 
 saying lock not found (the only lock in this txn).  So it seems TxnMgr 
 doesn't know that commit released the locks.
 Here is the stack trace and some log output which maybe useful:
 {noformat}
 2014-06-17 15:54:40,771 DEBUG mapreduce.TransactionContext 
 (TransactionContext.java:onCommitJob(128)) - 
 onCommitJob(job_local557130041_0001). this=46719652
 2014-06-17 15:54:40,771 DEBUG lockmgr.DbTxnManager 
 (DbTxnManager.java:commitTxn(205)) - Committing txn 1
 2014-06-17 15:54:40,771 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) 
 - Going to execute query values current_timestamp
 2014-06-17 15:54:40,772 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1423)) - Going to execute query select 
 txn_state from TXNS where txn_id = 1 for\
  update
 2014-06-17 15:54:40,773 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1438)) - Going to execute update update TXNS 
 set txn_last_heartbeat = 140304568\
 0772 where txn_id = 1
 2014-06-17 15:54:40,778 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1440)) - Going to commit
 2014-06-17 15:54:40,779 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(344)) 
 - Going to execute insert insert into COMPLETED_TXN_COMPONENTS select tc_txn\
 id, tc_database, tc_table, tc_partition from TXN_COMPONENTS where tc_txnid = 
 1
 2014-06-17 15:54:40,784 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(352)) 
 - Going to execute update delete from TXN_COMPONENTS where tc_txnid = 1
 2014-06-17 15:54:40,788 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(356)) 
 - Going to execute update delete from HIVE_LOCKS where hl_txnid = 1
 2014-06-17 15:54:40,791 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(359)) 
 - Going to execute update delete from TXNS where txn_id = 1
 2014-06-17 15:54:40,794 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(361)) 
 - Going to commit
 2014-06-17 15:54:40,795 WARN  mapreduce.TransactionContext 
 (TransactionContext.java:cleanup(317)) - 
 cleanupJob(JobID=job_local557130041_0001)this=46719652
 2014-06-17 15:54:40,795 DEBUG lockmgr.DbLockManager 
 (DbLockManager.java:unlock(109)) - Unlocking id:1
 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) 
 - Going to execute query values current_timestamp
 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatLock(1402)) - Going to execute update update 
 HIVE_LOCKS set hl_last_heartbeat = 140\
 3045680796 where hl_lock_ext_id = 1
 2014-06-17 15:54:40,800 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatLock(1405)) - Going to rollback
 2014-06-17 15:54:40,804 ERROR metastore.RetryingHMSHandler 
 (RetryingHMSHandler.java:invoke(143)) - NoSuchLockException(message:No such 
 lock: 1)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1407)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:477)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:4817)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
 at com.sun.proxy.$Proxy14.unlock(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1598)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:110)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbLockManager.close(DbLockManager.java:162)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.destruct(DbTxnManager.java:300)
 at 
 org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.closeTxnManager(HiveTxnManagerImpl.java:39)
 

[jira] [Updated] (HIVE-7090) Support session-level temporary tables in Hive

2014-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7090:
-

Attachment: HIVE-7090.2.patch

Patch v2 moves the management of the temp tables completely to the client side. 
 So changes are to HiveMetaStoreClient, rather than at the ObjectStore. Still 
needs more testing.


 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7249) HiveTxnManager.closeTxnManger() throws if called after commitTxn()

2014-06-23 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041158#comment-14041158
 ] 

Eugene Koifman commented on HIVE-7249:
--

yes, i did turn on DbTxnManager, but since we are creating a HCat specific API, 
let me retest it once that is ready

 HiveTxnManager.closeTxnManger() throws if called after commitTxn()
 --

 Key: HIVE-7249
 URL: https://issues.apache.org/jira/browse/HIVE-7249
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Alan Gates
 Attachments: HIVE-7249.patch


  I openTxn() and acquireLocks() for a query that looks like INSERT INTO T 
 PARTITION(p) SELECT * FROM T.
 Then I call commitTxn().  Then I call closeTxnManger() I get an exception 
 saying lock not found (the only lock in this txn).  So it seems TxnMgr 
 doesn't know that commit released the locks.
 Here is the stack trace and some log output which maybe useful:
 {noformat}
 2014-06-17 15:54:40,771 DEBUG mapreduce.TransactionContext 
 (TransactionContext.java:onCommitJob(128)) - 
 onCommitJob(job_local557130041_0001). this=46719652
 2014-06-17 15:54:40,771 DEBUG lockmgr.DbTxnManager 
 (DbTxnManager.java:commitTxn(205)) - Committing txn 1
 2014-06-17 15:54:40,771 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) 
 - Going to execute query values current_timestamp
 2014-06-17 15:54:40,772 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1423)) - Going to execute query select 
 txn_state from TXNS where txn_id = 1 for\
  update
 2014-06-17 15:54:40,773 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1438)) - Going to execute update update TXNS 
 set txn_last_heartbeat = 140304568\
 0772 where txn_id = 1
 2014-06-17 15:54:40,778 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1440)) - Going to commit
 2014-06-17 15:54:40,779 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(344)) 
 - Going to execute insert insert into COMPLETED_TXN_COMPONENTS select tc_txn\
 id, tc_database, tc_table, tc_partition from TXN_COMPONENTS where tc_txnid = 
 1
 2014-06-17 15:54:40,784 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(352)) 
 - Going to execute update delete from TXN_COMPONENTS where tc_txnid = 1
 2014-06-17 15:54:40,788 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(356)) 
 - Going to execute update delete from HIVE_LOCKS where hl_txnid = 1
 2014-06-17 15:54:40,791 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(359)) 
 - Going to execute update delete from TXNS where txn_id = 1
 2014-06-17 15:54:40,794 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(361)) 
 - Going to commit
 2014-06-17 15:54:40,795 WARN  mapreduce.TransactionContext 
 (TransactionContext.java:cleanup(317)) - 
 cleanupJob(JobID=job_local557130041_0001)this=46719652
 2014-06-17 15:54:40,795 DEBUG lockmgr.DbLockManager 
 (DbLockManager.java:unlock(109)) - Unlocking id:1
 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) 
 - Going to execute query values current_timestamp
 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatLock(1402)) - Going to execute update update 
 HIVE_LOCKS set hl_last_heartbeat = 140\
 3045680796 where hl_lock_ext_id = 1
 2014-06-17 15:54:40,800 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatLock(1405)) - Going to rollback
 2014-06-17 15:54:40,804 ERROR metastore.RetryingHMSHandler 
 (RetryingHMSHandler.java:invoke(143)) - NoSuchLockException(message:No such 
 lock: 1)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1407)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:477)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:4817)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
 at com.sun.proxy.$Proxy14.unlock(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1598)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:110)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbLockManager.close(DbLockManager.java:162)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.destruct(DbTxnManager.java:300)
 at 
 org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.closeTxnManager(HiveTxnManagerImpl.java:39)
 at 
 

[jira] [Commented] (HIVE-7235) TABLESAMPLE on join table is regarded as alias

2014-06-23 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041183#comment-14041183
 ] 

Harish Butani commented on HIVE-7235:
-

+1 lgtm

 TABLESAMPLE on join table is regarded as alias
 --

 Key: HIVE-7235
 URL: https://issues.apache.org/jira/browse/HIVE-7235
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7235.1.patch.txt


 {noformat}
 SELECT c_custkey, o_custkey
 FROM customer tablesample (1000 ROWS) join orders tablesample (1000 ROWS) on 
 c_custkey = o_custkey;
 {noformat}
 Fails with NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-23 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041188#comment-14041188
 ] 

David Chen commented on HIVE-7094:
--

Thanks, Carl and Sushanth!

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Fix For: 0.14.0

 Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch, HIVE-7094.4.patch, 
 HIVE-7094.5.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-06-23 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041250#comment-14041250
 ] 

Eugene Koifman commented on HIVE-7090:
--

If the client fails, how does the temp table get cleaned up?

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041282#comment-14041282
 ] 

Gunther Hagleitner commented on HIVE-7159:
--

Remaining failures are unrelated.

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.10.patch, 
 HIVE-7159.11.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, 
 HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, 
 HIVE-7159.9.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6207) Integrate HCatalog with locking

2014-06-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6207:
-

Attachment: ACIDHCatalogDesign.pdf

 Integrate HCatalog with locking
 ---

 Key: HIVE-6207
 URL: https://issues.apache.org/jira/browse/HIVE-6207
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: ACIDHCatalogDesign.pdf, HIVE-6207.4.patch


 HCatalog currently ignores any locks created by Hive users.  It should 
 respect the locks Hive creates as well as create locks itself when locking is 
 configured.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-06-23 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041304#comment-14041304
 ] 

Jason Dere commented on HIVE-7090:
--

The temp table scratch directory is deleted during session close, and also 
marked for deletion upon process close, which should clean up the directory for 
normal usage.
If the client dies, this cleanup does not occur and the directory is left in 
the user's scratch directory.  For HiveServer2, we could try to add cleanup 
thread to remove old temp table directories from the scratch directory.  For 
other users like HiveCLI, there would probably not be any automated cleanup, 
similar to other stuff that could get left around in the user's scratch 
directory.

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7225) Unclosed Statement's in TxnHandler

2014-06-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041306#comment-14041306
 ] 

Alan Gates commented on HIVE-7225:
--

+1

 Unclosed Statement's in TxnHandler
 --

 Key: HIVE-7225
 URL: https://issues.apache.org/jira/browse/HIVE-7225
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: steve, Oh
 Attachments: HIVE-7225.1.patch, hive-7225.3.patch


 There're several methods in TxnHandler where Statement (local to the method) 
 is not closed upon return.
 Here're a few examples:
 In compact():
 {code}
 stmt.executeUpdate(s);
 LOG.debug(Going to commit);
 dbConn.commit();
 {code}
 In showCompact():
 {code}
   Statement stmt = dbConn.createStatement();
   String s = select cq_database, cq_table, cq_partition, cq_state, 
 cq_type, cq_worker_id,  +
   cq_start, cq_run_as from COMPACTION_QUEUE;
   LOG.debug(Going to execute query  + s + );
   ResultSet rs = stmt.executeQuery(s);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks [~rhbutani]!

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-7159.1.patch, HIVE-7159.10.patch, 
 HIVE-7159.11.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, 
 HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, HIVE-7159.8.patch, 
 HIVE-7159.9.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6469) skipTrash option in hive command line

2014-06-23 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041372#comment-14041372
 ] 

Ravi Prakash commented on HIVE-6469:


Can folks watching this JIRA please review HIVE-7100 which now has a patch? 
Would that be an acceptable option instead of this?

 skipTrash option in hive command line
 -

 Key: HIVE-6469
 URL: https://issues.apache.org/jira/browse/HIVE-6469
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.12.0
Reporter: Jayesh
Assignee: Jayesh
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6469.1.patch, HIVE-6469.2.patch, HIVE-6469.3.patch, 
 HIVE-6469.patch


 Th current behavior of hive metastore during a drop table table_name 
 command is to delete the data from HDFS warehouse and put it into Trash.
 Currently there is no way to provide a flag to tell the warehouse to skip 
 trash while deleting table data.
 This ticket is to add skipTrash configuration hive.warehouse.data.skipTrash 
 , which when set to true, will skipTrash while dropping table data from hdfs 
 warehouse. This will be set to false by default to keep current behavior.
 This would be good feature to add, so that an admin of the cluster can 
 specify when not to put data into the trash directory (eg. in a dev 
 environment) and thus not to fill hdfs space instead of relying on trash 
 interval and policy configuration to take care of disk filling issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-06-23 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041378#comment-14041378
 ] 

Eugene Koifman commented on HIVE-7090:
--

In that case it may make sense to generate unique names for artifacts that may 
be left over.  The initial description in this ticket mentions 3rd party tools 
that will use this feature - I imagine they will generate the same Temp table 
name each time which may cause weird failures after crash.

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7275) optimize these functions for windowing function.

2014-06-23 Thread Kiet Ly (JIRA)
Kiet Ly created HIVE-7275:
-

 Summary: optimize these functions for windowing function.
 Key: HIVE-7275
 URL: https://issues.apache.org/jira/browse/HIVE-7275
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.12.0, 0.11.0
 Environment: Hadoop 2.4.0, Hive 13.0
Reporter: Kiet Ly


Please apply the window streaming optimization from issue HIVE-7143/7062 to 
these functions if they are applicable.

row_number 
count 
rank 
dense_rank  
nvl 
rank 
dense_rank  
nvl  
cast  
decode  
median  
stddev  
coalesce  
floor  
sign  
abs  
ltrim  
substring  
to_char 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-06-23 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041399#comment-14041399
 ] 

Jason Dere commented on HIVE-7090:
--

Yes good point.  The patch actually does this - each session will have its own 
scratch directory for temp tables, using the session ID (a UUID).  Within the 
session's temp table scratch directory, each created temp table will get its 
own directory, also generated using UUID.

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7266) Optimized HashTable with vectorized map-joins results in String columns extending

2014-06-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7266:
--

Assignee: Matt McCline  (was: Jitendra Nath Pandey)

 Optimized HashTable with vectorized map-joins results in String columns 
 extending
 -

 Key: HIVE-7266
 URL: https://issues.apache.org/jira/browse/HIVE-7266
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Matt McCline
 Attachments: hive-7266-small-test.tgz


 The following query returns different results when both vectorized mapjoin 
 and the new optimized hashtable are enabled.
 {code}
 hive set hive.vectorized.execution.enabled=false;
 hive select s_suppkey, n_name from supplier, nation where s_nationkey = 
 n_nationkey limit 25;
 ...
 316869  JAPAN
 1636869 RUSSIA
 1096869 IRAN
 7236869 RUSSIA
 2276869 INDIA
 8516869 ARGENTINA
 2636869 MOZAMBIQUE
 3836869 ROMANIA
 2616869 FRANCE
 {code}
 But when vectorization is enabled, the results are 
 {code}
 316869  JAPAN
 1636869 RUSSIA
 1096869 IRANIA
 7236869 RUSSIA
 2276869 INDIAA
 8516869 ARGENTINA
 2636869 MOZAMBIQUE
 3836869 ROMANIAQUE
 2616869 FRANCEAQUE
 {code}
 it works correctly with vectorization when the new optimized map-join 
 hashtable is disabled 
 {code}
 hive set hive.vectorized.execution.enabled=true; 
 
 hive set hive.mapjoin.optimized.hashtable=false; 
 
 hive select s_suppkey, n_name from supplier, nation where s_nationkey = 
 n_nationkey limit 25;
 316869  JAPAN
 1636869 RUSSIA
 1096869 IRAN
 7236869 RUSSIA
 2276869 INDIA
 8516869 ARGENTINA
 2636869 MOZAMBIQUE
 3836869 ROMANIA
 2616869 FRANCE
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7090) Support session-level temporary tables in Hive

2014-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7090:
-

Attachment: HIVE-7090.3.patch

rebase with trunk

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6893) out of sequence error in HiveMetastore server

2014-06-23 Thread Gilad Wolff (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041510#comment-14041510
 ] 

Gilad Wolff commented on HIVE-6893:
---

I encountered the same issue, we get a socket read timeout and then 
out-of-sequence error. In one case we got an OOM in our client and I suspect 
it's the same underlying issue. Here is the metastore sequence of events. Our 
client tried to drop a table starting at 14:02:25. Note that we use a 20 
seconds timeout for our client:
{code}
2014-06-23 14:02:25,181 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 
11: source:/10.20.93.47 drop_table : 
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE
2014-06-23 14:02:25,181 INFO 
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue  
ip=/10.20.93.47 cmd=source:/10.20.93.47 drop_table : 
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE 
2014-06-23 14:02:25,182 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 
11: source:/10.20.93.47 get_table : 
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE
2014-06-23 14:02:25,182 INFO 
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue  
ip=/10.20.93.47 cmd=source:/10.20.93.47 get_table : 
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE  
2014-06-23 14:02:46,596 INFO hive.metastore.hivemetastoressimpl: deleting  
hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table
2014-06-23 14:02:46,694 INFO hive.metastore.hivemetastoressimpl: Moved to 
trash: 
hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table
{code}

On our client we get a socket timeout for the drop table call at 14:02:45:
{code}
2:02:45.209 PM  WARN
com.cloudera.cmon.firehose.polling.hive.HiveMetastoreCanary Metastore 
HIVE-1-HIVEMETASTORE-627a77825bb851bf2db30317a698dded: Failed to drop table 
com.cloudera.cmf.cdhclient.common.hive.MetaException: 
java.net.SocketTimeoutException: Read timed out
{code}
we then try to drop the database immediately afterwards and the next message in 
our logs is:
{code}
2:02:46.697 PM  WARNcom.cloudera.cmf.cdh4client.hive.MetastoreClientImpl
Could not drop hive database: cloudera_manager_metastore_canary_test_db
com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TApplicationException: 
get_database failed: out of sequence response
at 
com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:479)
at 
com.cloudera.cmf.cdh4client.hive.MetastoreClientImpl.dropDatabase(MetastoreClientImpl.java:160)
{code}

Note that the moved-to-trash message in the hive metastore is from 14:02:46,694 
and the out-of-order exception is from 2:02:46.697. I know that order-in-time 
does not imply causation but is it possible that we are getting the drop-table 
acknowledgment message instead of the get_database message?

 out of sequence error in HiveMetastore server
 -

 Key: HIVE-6893
 URL: https://issues.apache.org/jira/browse/HIVE-6893
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Romain Rigaux
Assignee: Naveen Gangam
 Fix For: 0.14.0

 Attachments: HIVE-6893.1.patch


 Calls listing databases or tables fail. It seems to be a concurrency problem.
 {code}
 014-03-06 05:34:00,785 ERROR hive.log: 
 org.apache.thrift.TApplicationException: get_databases failed: out of 
 sequence response
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:472)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:459)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:648)
 at 
 org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:66)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:278)
 at 

[jira] [Commented] (HIVE-7257) UDF format_number() does not work on FLOAT types

2014-06-23 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041525#comment-14041525
 ] 

Szehon Ho commented on HIVE-7257:
-

[~wilbur.yang].  Thanks for this patch, I do have one concern. the query
{noformat}SELECT format_number(CAST(12332.123456 AS FLOAT), 4),{noformat}

shows a result like :
{noformat}12,332.1230{noformat}

It doesn't look correct, unless I'm missing something?  I would expect 
12332.1235, like it shows in decimal.

 UDF format_number() does not work on FLOAT types
 

 Key: HIVE-7257
 URL: https://issues.apache.org/jira/browse/HIVE-7257
 Project: Hive
  Issue Type: Bug
Reporter: Wilbur Yang
Assignee: Wilbur Yang
 Attachments: HIVE-7257.1.patch


 #1 Show the table:
 hive describe ssga3; 
 OK
 sourcestring  
 test  float   
 dttimestamp   
 Time taken: 0.243 seconds
 #2 Run format_number on double and it works:
 hive select format_number(cast(test as double),2) from ssga3;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0009, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0009
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 
 1.47 sec
 MapReduce Total cumulative CPU time: 1 seconds 470 msec
 Ended Job = job_201403131616_0009
 MapReduce Jobs Launched: 
 Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS
 Total MapReduce CPU Time Spent: 1 seconds 470 msec
 OK
 1.00
 2.00
 Time taken: 16.563 seconds
 #3 Run format_number on float and it does not work
 hive select format_number(test,2) from ssga3; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0010, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0010
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100%
 Ended Job = job_201403131616_0010 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Examining task ID: task_201403131616_0010_m_02 (and more) from job 
 job_201403131616_0010
 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid 
 host:port authority: logicaljt
 Task with the most failures(4):
 Task ID:
 task_201403131616_0010_m_00
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
 ..
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT

2014-06-23 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2597:


Attachment: HIVE-2597.4.patch.txt

Updated XML results

 Repeated key in GROUP BY is erroneously displayed when using DISTINCT
 -

 Key: HIVE-2597
 URL: https://issues.apache.org/jira/browse/HIVE-2597
 Project: Hive
  Issue Type: Bug
Reporter: Alex Rovner
Assignee: Navis
 Attachments: HIVE-2597.3.patch.txt, HIVE-2597.4.patch.txt, 
 HIVE-2597.D8967.1.patch, HIVE-2597.D8967.2.patch


 The following query was simplified for illustration purposes. 
 This works correctly:
 select client_tid,  as myvalue1,  as myvalue2 from clients cluster by 
 client_tid
 The intent here is to produce two empty columns in between data.
 The following query does not work:
 select distinct client_tid,  as myvalue1,  as myvalue2 from clients 
 cluster by client_tid
 FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY 
 The key is not repeated since the aliases were given. Seems like Hive is 
 ignoring the aliases when the distinct keyword is specified.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 22901: Repeated key in GROUP BY is erroneously displayed when using DISTINCT

2014-06-23 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22901/
---

Review request for hive.


Bugs: HIVE-2597
https://issues.apache.org/jira/browse/HIVE-2597


Repository: hive-git


Description
---

The following query was simplified for illustration purposes. 

This works correctly:
select client_tid,  as myvalue1,  as myvalue2 from clients cluster by 
client_tid

The intent here is to produce two empty columns in between data.

The following query does not work:
select distinct client_tid,  as myvalue1,  as myvalue2 from clients cluster 
by client_tid

FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY 

The key is not repeated since the aliases were given. Seems like Hive is 
ignoring the aliases when the distinct keyword is specified.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8ae1c73 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cb284d7 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java c60f56f 
  ql/src/test/queries/clientpositive/groupby_duplicate_key.q PRE-CREATION 
  ql/src/test/results/clientpositive/groupby_duplicate_key.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml af100ed 
  ql/src/test/results/compiler/plan/groupby4.q.xml 1822733 
  ql/src/test/results/compiler/plan/groupby5.q.xml 0bfc684 
  ql/src/test/results/compiler/plan/groupby6.q.xml 5b3696c 
  ql/src/test/results/compiler/plan/join1.q.xml e88d5dd 
  ql/src/test/results/compiler/plan/join2.q.xml 11c44c7 
  ql/src/test/results/compiler/plan/join3.q.xml 6fde4e0 
  ql/src/test/results/compiler/plan/join4.q.xml 22a4911 
  ql/src/test/results/compiler/plan/join5.q.xml 5033366 
  ql/src/test/results/compiler/plan/join6.q.xml b1185a9 
  ql/src/test/results/compiler/plan/join7.q.xml a1ab3e6 
  ql/src/test/results/compiler/plan/join8.q.xml ba128d4 

Diff: https://reviews.apache.org/r/22901/diff/


Testing
---


Thanks,

Navis Ryu



Review Request 22902: SerDe Properties are not considered by show create table Command

2014-06-23 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22902/
---

Review request for hive.


Bugs: HIVE-7270
https://issues.apache.org/jira/browse/HIVE-7270


Repository: hive-git


Description
---

The HIVE table DDl generated by show create table target_table command does 
not contain SerDe properties of the target table even though it contain 
specific SerDe properties. 


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java fad5ed3 
  ql/src/test/queries/clientpositive/show_create_table_serde.q a3eb5a8 
  ql/src/test/results/clientpositive/show_create_table_serde.q.out a9e92b4 

Diff: https://reviews.apache.org/r/22902/diff/


Testing
---


Thanks,

Navis Ryu



[jira] [Commented] (HIVE-7257) UDF format_number() does not work on FLOAT types

2014-06-23 Thread Wilbur Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041533#comment-14041533
 ] 

Wilbur Yang commented on HIVE-7257:
---

[~szehon], thanks for the review. That particular case seems to be a quirk with 
floats -- I [tested 
System.out.println((float)12332.123456);|http://ideone.com/oP4NDJ] and it 
prints 12332.123. I suppose the question now is whether or not we want it to 
behave like this.

 UDF format_number() does not work on FLOAT types
 

 Key: HIVE-7257
 URL: https://issues.apache.org/jira/browse/HIVE-7257
 Project: Hive
  Issue Type: Bug
Reporter: Wilbur Yang
Assignee: Wilbur Yang
 Attachments: HIVE-7257.1.patch


 #1 Show the table:
 hive describe ssga3; 
 OK
 sourcestring  
 test  float   
 dttimestamp   
 Time taken: 0.243 seconds
 #2 Run format_number on double and it works:
 hive select format_number(cast(test as double),2) from ssga3;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0009, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0009
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 
 1.47 sec
 MapReduce Total cumulative CPU time: 1 seconds 470 msec
 Ended Job = job_201403131616_0009
 MapReduce Jobs Launched: 
 Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS
 Total MapReduce CPU Time Spent: 1 seconds 470 msec
 OK
 1.00
 2.00
 Time taken: 16.563 seconds
 #3 Run format_number on float and it does not work
 hive select format_number(test,2) from ssga3; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0010, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0010
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100%
 Ended Job = job_201403131616_0010 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Examining task ID: task_201403131616_0010_m_02 (and more) from job 
 job_201403131616_0010
 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid 
 host:port authority: logicaljt
 Task with the most failures(4):
 Task ID:
 task_201403131616_0010_m_00
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
 ..
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-06-23 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7205:


Status: Patch Available  (was: Open)

 Wrong results when union all of grouping followed by group by with 
 correlation optimization
 ---

 Key: HIVE-7205
 URL: https://issues.apache.org/jira/browse/HIVE-7205
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: dima machlin
Assignee: Navis
Priority: Critical
 Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt


 use case :
 table TBL (a string,b string) contains single row : 'a','a'
 the following query :
 {code:sql}
 select b, sum(cc) from (
 select b,count(1) as cc from TBL group by b
 union all
 select a as b,count(1) as cc from TBL group by a
 ) z
 group by b
 {code}
 returns 
 a 1
 a 1
 while set hive.optimize.correlation=true;
 if we change set hive.optimize.correlation=false;
 it returns correct results : a 2
 The plan with correlation optimization :
 {code:sql}
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
 (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
 (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
 (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
 (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
 (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL 
 a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
 (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
 (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 null-subquery1:z-subquery1:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: b
 type: string
   outputColumnNames: b
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: b
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 0
   value expressions:
 expr: _col1
 type: bigint
 null-subquery2:z-subquery2:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: a
 type: string
   outputColumnNames: a
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: a
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 1
   value expressions:
 expr: _col1
 type: bigint
   Reduce Operator Tree:
 Demux Operator
   Group By Operator
 aggregations:
   expr: count(VALUE._col0)
 bucketGroup: false
 keys:
   expr: KEY._col0
   type: string
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
 expr: _col0
 type: string
 expr: _col1
 type: bigint
   outputColumnNames: _col0, _col1
   Union
 Select Operator
   expressions:
 expr: _col0
 type: string
 expr: _col1
 type: bigint
   

[jira] [Updated] (HIVE-7237) hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever

2014-06-23 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7237:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Ashutosh, for the review.

 hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever
 -

 Key: HIVE-7237
 URL: https://issues.apache.org/jira/browse/HIVE-7237
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.13.0
 Environment: HDP 2.1, Hive 0.13, SLES 11, 128GB data nodes, ORC SNAPPY
Reporter: Douglas Moore
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7237.1.patch.txt, HIVE-7237.2.patch.txt


 set hive.exec.parallel=true; will cause the Yarn application instance to 
 linger
 forever. set hive.exec.parallel=false, the application goes away as soon as 
 hive query is complete. The underlying table is an ORC store_sales table 
 compressed with SNAPPY.
 {code}
 hive.exec.parallel=true;
 select * from store_sales where ss_ticket_number=5741230 and ss_item_sk=4825
 {code}
 The query will run under Tez and finish  30 seconds.
 After 30-40 of these jobs the cluster gets to a point where no jobs will 
 finish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7211) Throws exception if the name of conf var starts with hive. does not exists in HiveConf

2014-06-23 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7211:


Attachment: HIVE-7211.4.patch.txt

 Throws exception if the name of conf var starts with hive. does not exists 
 in HiveConf
 

 Key: HIVE-7211
 URL: https://issues.apache.org/jira/browse/HIVE-7211
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7211.1.patch.txt, HIVE-7211.2.patch.txt, 
 HIVE-7211.3.patch.txt, HIVE-7211.4.patch.txt


 Some typos in configurations are very hard to find.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 22903: Extend join transitivity PPD to non-column expressions

2014-06-23 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22903/
---

Review request for hive.


Bugs: HIVE-7111
https://issues.apache.org/jira/browse/HIVE-7111


Repository: hive-git


Description
---

Join transitive in PPD only supports column expressions, but it's possible to 
extend this to generic expressions.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java f293c43 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java f7a3f1c 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 7aaf455 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicatePushDown.java e0d6aaf 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/PredicateTransitivePropagate.java 
1476e1a 
  ql/src/test/queries/clientpositive/auto_join33.q PRE-CREATION 
  ql/src/test/results/clientpositive/auto_join33.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/22903/diff/


Testing
---


Thanks,

Navis Ryu



Review Request 22904: Throws exception if the name of conf var starts with hive. does not exists in HiveConf

2014-06-23 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22904/
---

Review request for hive.


Bugs: HIVE-7211
https://issues.apache.org/jira/browse/HIVE-7211


Repository: hive-git


Description
---

Some typos in configurations are very hard to find.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7932a3d 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
7b91e1d 
  hbase-handler/src/test/queries/positive/hbase_stats.q 52efef5 
  hbase-handler/src/test/queries/positive/hbase_stats2.q 520e003 
  hbase-handler/src/test/queries/positive/hbase_stats3.q c3134f0 
  hbase-handler/src/test/results/positive/hbase_stats2.q.out 80e1c6d 
  hbase-handler/src/test/results/positive/hbase_stats3.q.out ce7dda4 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/SpecialCases.java
 0c1fa23 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/rcfile/RCFileMapReduceOutputFormat.java
 b09ab4c 
  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java
 9a89980 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/VerifyOverriddenConfigsHook.java
 41c178a 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/stats/DummyStatsAggregator.java
 1bafd97 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/stats/DummyStatsPublisher.java
 4dd632d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5e5cf97 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 179ad29 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 3bc7e43 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java 953d9b4 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 
ffd7597 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateMapper.java
 257f186 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a988b44 
  ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 9b24bfd 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 464bd5e 
  ql/src/test/queries/clientpositive/dbtxnmgr_compact1.q 6612fe8 
  ql/src/test/queries/clientpositive/dbtxnmgr_compact2.q 599cad9 
  ql/src/test/queries/clientpositive/dbtxnmgr_compact3.q 871d292 
  ql/src/test/queries/clientpositive/dbtxnmgr_showlocks.q 7c71fdd 
  ql/src/test/queries/clientpositive/index_bitmap_compression.q 4e93275 
  ql/src/test/queries/clientpositive/index_compression.q 1bb29a5 
  ql/src/test/queries/clientpositive/join25.q 75f542d 
  ql/src/test/queries/clientpositive/join36.q dd99d44 
  ql/src/test/queries/clientpositive/join37.q dc57d3a 
  ql/src/test/queries/clientpositive/join_nulls.q 6c8ad10 
  ql/src/test/queries/clientpositive/join_nullsafe.q 7c3d1e8 
  ql/src/test/queries/clientpositive/metadata_export_drop.q e2da61a 
  ql/src/test/queries/clientpositive/overridden_confs.q 9dcaed6 
  ql/src/test/queries/clientpositive/quotedid_skew.q 5c95967 
  ql/src/test/queries/clientpositive/skewjoin_union_remove_1.q fc07742 
  ql/src/test/queries/clientpositive/skewjoin_union_remove_2.q 50cfc61 
  ql/src/test/queries/clientpositive/skewjoinopt1.q 504ba8b 
  ql/src/test/queries/clientpositive/skewjoinopt10.q f35af90 
  ql/src/test/queries/clientpositive/skewjoinopt11.q 9e00bdc 
  ql/src/test/queries/clientpositive/skewjoinopt12.q 1719950 
  ql/src/test/queries/clientpositive/skewjoinopt13.q 5ef217c 
  ql/src/test/queries/clientpositive/skewjoinopt14.q df1a26b 
  ql/src/test/queries/clientpositive/skewjoinopt15.q 1db5472 
  ql/src/test/queries/clientpositive/skewjoinopt16.q 915de61 
  ql/src/test/queries/clientpositive/skewjoinopt17.q 2ee79cc 
  ql/src/test/queries/clientpositive/skewjoinopt18.q 9d06cc0 
  ql/src/test/queries/clientpositive/skewjoinopt19.q 075645f 
  ql/src/test/queries/clientpositive/skewjoinopt2.q f7acaad 
  ql/src/test/queries/clientpositive/skewjoinopt20.q 9b908ce 
  ql/src/test/queries/clientpositive/skewjoinopt3.q 22ea4f0 
  ql/src/test/queries/clientpositive/skewjoinopt4.q 8496b1a 
  ql/src/test/queries/clientpositive/skewjoinopt5.q 152de5b 
  ql/src/test/queries/clientpositive/skewjoinopt6.q 2e261bd 
  ql/src/test/queries/clientpositive/skewjoinopt7.q e4d9605 
  ql/src/test/queries/clientpositive/skewjoinopt8.q 85746d9 
  ql/src/test/queries/clientpositive/skewjoinopt9.q 889ab6c 
  ql/src/test/queries/clientpositive/smb_mapjoin_25.q e43174b 
  ql/src/test/queries/clientpositive/stats15.q 9a557c6 
  ql/src/test/queries/clientpositive/truncate_table.q 975c0f1 
  ql/src/test/queries/clientpositive/udtf_explode.q 1d405b3 
  ql/src/test/queries/clientpositive/vector_decimal_mapjoin.q d8b3d1a 
  ql/src/test/queries/clientpositive/vectorized_bucketmapjoin1.q e309713 
  ql/src/test/queries/clientpositive/vectorized_mapjoin.q f390c2c 
  ql/src/test/queries/clientpositive/vectorized_nested_mapjoin.q ce4227c 
 

[jira] [Commented] (HIVE-7232) VectorReduceSink is emitting incorrect JOIN keys

2014-06-23 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041575#comment-14041575
 ] 

Gopal V commented on HIVE-7232:
---

[~ashutoshc]: Yes, I will review this today.

 VectorReduceSink is emitting incorrect JOIN keys
 

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-7232-extra-logging.patch, HIVE-7232.1.patch.txt, 
 q5.explain.txt, q5.sql


 After HIVE-7121, tpc-h query5 has resulted in incorrect results.
 Thanks to [~navis], it has been tracked down to the auto-parallel settings 
 which were initialized for ReduceSinkOperator, but not for 
 VectorReduceSinkOperator. The vector version inherits, but doesn't call 
 super.initializeOp() or set up the variable correctly from ReduceSinkDesc.
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7257) UDF format_number() does not work on FLOAT types

2014-06-23 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041576#comment-14041576
 ] 

Szehon Ho commented on HIVE-7257:
-

Ah, I guess that is most bits that fit into that float.  I'm ok with the change 
then, +1.

 UDF format_number() does not work on FLOAT types
 

 Key: HIVE-7257
 URL: https://issues.apache.org/jira/browse/HIVE-7257
 Project: Hive
  Issue Type: Bug
Reporter: Wilbur Yang
Assignee: Wilbur Yang
 Attachments: HIVE-7257.1.patch


 #1 Show the table:
 hive describe ssga3; 
 OK
 sourcestring  
 test  float   
 dttimestamp   
 Time taken: 0.243 seconds
 #2 Run format_number on double and it works:
 hive select format_number(cast(test as double),2) from ssga3;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0009, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0009
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 
 1.47 sec
 MapReduce Total cumulative CPU time: 1 seconds 470 msec
 Ended Job = job_201403131616_0009
 MapReduce Jobs Launched: 
 Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS
 Total MapReduce CPU Time Spent: 1 seconds 470 msec
 OK
 1.00
 2.00
 Time taken: 16.563 seconds
 #3 Run format_number on float and it does not work
 hive select format_number(test,2) from ssga3; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0010, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0010
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100%
 Ended Job = job_201403131616_0010 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Examining task ID: task_201403131616_0010_m_02 (and more) from job 
 job_201403131616_0010
 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid 
 host:port authority: logicaljt
 Task with the most failures(4):
 Task ID:
 task_201403131616_0010_m_00
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
 ..
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7194) authorization_ctas.q failing on trunk

2014-06-23 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7194:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Thejas   Ashutosh.

 authorization_ctas.q failing on trunk
 -

 Key: HIVE-7194
 URL: https://issues.apache.org/jira/browse/HIVE-7194
 Project: Hive
  Issue Type: Task
  Components: Authorization
Reporter: Ashutosh Chauhan
Assignee: Thejas M Nair
 Fix For: 0.14.0

 Attachments: HIVE-7194.1.patch.txt, HIVE-7194.patch


 Need to update .q.out file



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7271) Speed up unit tests

2014-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7271:
-

Status: Open  (was: Patch Available)

 Speed up unit tests
 ---

 Key: HIVE-7271
 URL: https://issues.apache.org/jira/browse/HIVE-7271
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, 
 HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch


 Did some experiments to see if there's a way to speed up unit tests. 
 TestCliDriver seemed to take a lot of time just spinning up/tearing down 
 JVMs. I was also curious to see if running everything on a ram disk would 
 help.
 Results (I ran tests up to authorization_2):
 - Current setup: 40 minutes
 - Single JVM (not using child JVM to run all queries): 8 minutes
 - Single JVM + ram disk: 7 minutes
 So the ram disk didn't help that much. But running tests in single JVM seems 
 worthwhile doing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7271) Speed up unit tests

2014-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7271:
-

Attachment: HIVE-7271.6.patch

.6 fixes test failures (golden files again). Also includes the renamed methods 
[~szehon] asked for.

 Speed up unit tests
 ---

 Key: HIVE-7271
 URL: https://issues.apache.org/jira/browse/HIVE-7271
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, 
 HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch


 Did some experiments to see if there's a way to speed up unit tests. 
 TestCliDriver seemed to take a lot of time just spinning up/tearing down 
 JVMs. I was also curious to see if running everything on a ram disk would 
 help.
 Results (I ran tests up to authorization_2):
 - Current setup: 40 minutes
 - Single JVM (not using child JVM to run all queries): 8 minutes
 - Single JVM + ram disk: 7 minutes
 So the ram disk didn't help that much. But running tests in single JVM seems 
 worthwhile doing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7271) Speed up unit tests

2014-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7271:
-

Status: Patch Available  (was: Open)

 Speed up unit tests
 ---

 Key: HIVE-7271
 URL: https://issues.apache.org/jira/browse/HIVE-7271
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, 
 HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch


 Did some experiments to see if there's a way to speed up unit tests. 
 TestCliDriver seemed to take a lot of time just spinning up/tearing down 
 JVMs. I was also curious to see if running everything on a ram disk would 
 help.
 Results (I ran tests up to authorization_2):
 - Current setup: 40 minutes
 - Single JVM (not using child JVM to run all queries): 8 minutes
 - Single JVM + ram disk: 7 minutes
 So the ram disk didn't help that much. But running tests in single JVM seems 
 worthwhile doing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7271) Speed up unit tests

2014-06-23 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041634#comment-14041634
 ] 

Brock Noland commented on HIVE-7271:


+1 LGTM

Regardless of the memory item, I updated our instance types since the c3 have a 
faster CPU.

 Speed up unit tests
 ---

 Key: HIVE-7271
 URL: https://issues.apache.org/jira/browse/HIVE-7271
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, 
 HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch


 Did some experiments to see if there's a way to speed up unit tests. 
 TestCliDriver seemed to take a lot of time just spinning up/tearing down 
 JVMs. I was also curious to see if running everything on a ram disk would 
 help.
 Results (I ran tests up to authorization_2):
 - Current setup: 40 minutes
 - Single JVM (not using child JVM to run all queries): 8 minutes
 - Single JVM + ram disk: 7 minutes
 So the ram disk didn't help that much. But running tests in single JVM seems 
 worthwhile doing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7274) Update PTest2 to JClouds 1.7.3

2014-06-23 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7274:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you for the review Szehon! I have committed this to trunk.

 Update PTest2 to JClouds 1.7.3
 --

 Key: HIVE-7274
 URL: https://issues.apache.org/jira/browse/HIVE-7274
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-7274.patch


 Required to use newer instance types



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2014-06-23 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041657#comment-14041657
 ] 

Bing Li commented on HIVE-4577:
---

Hi, [~thejas]
Thank you for your comments.
I tried StrTokenizer, seems it only can handle part of scenarios, like

dfs -mkdir hello world   // StrTokenizer(cmd,splitDel,doubleQuo)
dfs -mkdir 'hello world   // StrTokenizer(cmd,splitDel,singleQuo)

But can't handle the wrong input. like
dfs -mkdir abd'dbabe'//  and ' are not matched

Let me know if I missed something.

Thank you!

 hive CLI can't handle hadoop dfs command  with space and quotes.
 

 Key: HIVE-4577
 URL: https://issues.apache.org/jira/browse/HIVE-4577
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0, 0.10.0
Reporter: Bing Li
Assignee: Bing Li
 Fix For: 0.14.0

 Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
 HIVE-4577.3.patch.txt


 As design, hive could support hadoop dfs command in hive shell, like 
 hive dfs -mkdir /user/biadmin/mydir;
 but has different behavior with hadoop if the path contains space and quotes
 hive dfs -mkdir hello; 
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
 /user/biadmin/hello
 hive dfs -mkdir 'world';
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
 /user/biadmin/'world'
 hive dfs -mkdir bei jing;
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/bei
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/jing



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT

2014-06-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041669#comment-14041669
 ] 

Hive QA commented on HIVE-2597:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12652100/HIVE-2597.4.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5655 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/570/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/570/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-570/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12652100

 Repeated key in GROUP BY is erroneously displayed when using DISTINCT
 -

 Key: HIVE-2597
 URL: https://issues.apache.org/jira/browse/HIVE-2597
 Project: Hive
  Issue Type: Bug
Reporter: Alex Rovner
Assignee: Navis
 Attachments: HIVE-2597.3.patch.txt, HIVE-2597.4.patch.txt, 
 HIVE-2597.D8967.1.patch, HIVE-2597.D8967.2.patch


 The following query was simplified for illustration purposes. 
 This works correctly:
 select client_tid,  as myvalue1,  as myvalue2 from clients cluster by 
 client_tid
 The intent here is to produce two empty columns in between data.
 The following query does not work:
 select distinct client_tid,  as myvalue1,  as myvalue2 from clients 
 cluster by client_tid
 FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY 
 The key is not repeated since the aliases were given. Seems like Hive is 
 ignoring the aliases when the distinct keyword is specified.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6564) WebHCat E2E tests that launch MR jobs fail on check job completion timeout

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041678#comment-14041678
 ] 

Ashutosh Chauhan commented on HIVE-6564:


+1

 WebHCat E2E tests that launch MR jobs fail on check job completion timeout
 --

 Key: HIVE-6564
 URL: https://issues.apache.org/jira/browse/HIVE-6564
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Attachments: HIVE-6564.2.patch, HIVE-6564.patch


 WebHCat E2E tests that fire off an MR job are not correctly being detected as 
 complete so those tests are timing out.
 The problem is happening because of JSON module available through cpan which 
 returns 1 or 0 instead of true or false.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7241) Wrong lock acquired for alter table rename partition

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041679#comment-14041679
 ] 

Ashutosh Chauhan commented on HIVE-7241:


+1

 Wrong lock acquired for alter table rename partition
 

 Key: HIVE-7241
 URL: https://issues.apache.org/jira/browse/HIVE-7241
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7241.patch, HIVE-7241.patch


 Doing an alter table foo partition (bar='x') rename to partition (bar='y') 
 acquires a read lock on table foo.  It should instead acquire an exclusive 
 lock on partition bar=x.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7242) alter table drop partition is acquiring the wrong type of lock

2014-06-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041685#comment-14041685
 ] 

Ashutosh Chauhan commented on HIVE-7242:


+1

 alter table drop partition is acquiring the wrong type of lock
 --

 Key: HIVE-7242
 URL: https://issues.apache.org/jira/browse/HIVE-7242
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.14.0

 Attachments: HIVE-7242.patch


 Doing an alter table foo drop partition ('bar=x') acquired a shared-write 
 lock on partition bar=x.  It should be acquiring an exclusive lock in that 
 case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7258) Move qtest-Driver properties from pom to separate file

2014-06-23 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041691#comment-14041691
 ] 

Gunther Hagleitner commented on HIVE-7258:
--

[~szehon] - is that what you were looking for? Can you pull the values from 
that file?

 Move qtest-Driver properties from pom to separate file  
 

 Key: HIVE-7258
 URL: https://issues.apache.org/jira/browse/HIVE-7258
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Gunther Hagleitner
 Attachments: HIVE-7258.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7258) Move qtest-Driver properties from pom to separate file

2014-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-7258:


Assignee: Gunther Hagleitner

 Move qtest-Driver properties from pom to separate file  
 

 Key: HIVE-7258
 URL: https://issues.apache.org/jira/browse/HIVE-7258
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Gunther Hagleitner
 Attachments: HIVE-7258.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7258) Move qtest-Driver properties from pom to separate file

2014-06-23 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7258:
-

Status: Patch Available  (was: Open)

 Move qtest-Driver properties from pom to separate file  
 

 Key: HIVE-7258
 URL: https://issues.apache.org/jira/browse/HIVE-7258
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Gunther Hagleitner
 Attachments: HIVE-7258.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >