[jira] [Created] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT

2015-06-10 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10975:
---

 Summary: Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT
 Key: HIVE-10975
 URL: https://issues.apache.org/jira/browse/HIVE-10975
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Priority: Minor


There are lots of changes since parquet's graduation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35107: HIVE-6791 Support variable substition for Beeline shell command

2015-06-10 Thread Xuefu Zhang


 On June 9, 2015, 4:41 a.m., Xuefu Zhang wrote:
  Besides the two minor issues I found in the patch, I was wondering if the 
  approach we are taking is the best. Variable substitution is a server (HS2) 
  behavior, and on this ground I think this should happen in HS2 instead of 
  beeline. Please note that JDBC client may also submit queries with $var in 
  it, and such a case should be also supported.
  
  I also noticed that in Driver class, there is code handling variable 
  substitution. I'm wondering why it's not effective.
  
  Shell command (starting with !sh) is executed in the client (Beeline). I 
  think we are fine if variable substituion doesn't work for shell command. 
  We can address that as a followup taks if desirable.
 
 cheng xu wrote:
 Thanks for your comments. 
 
 `I also noticed that in Driver class, there is code handling variable 
 substitution. I'm wondering why it's not effective.`
 
 The substitution works well in HS2 currently.
 There are two reasons for me to add API getting the conf from HS2. One is 
 to support substitution in sh and source command. In the old cli, source 
 command and sh command worked well with substitution. So this part of this 
 patch is addressing this purpose. Another consideration is for 
 https://issues.apache.org/jira/browse/HIVE-10847 which required some 
 configuration from hive-site.xml.
 
 Xuefu Zhang wrote:
 Yeah. It's a little trickier than thought. Shell command is executed at 
 client side (Beeline) and it doesn't seem making sense to use server specific 
 variables such (env, sys, hiveconf, hivevar) in the shell commands. More 
 importantly, Beeline can connect to multiple serves at the same time, so 
 which configurations should be used to substitue the variables? User should 
 be able to execute shell commands w/o any server connection.
 
 For CLI, server and client are together, so these don't matter. But for 
 beeline + HS2 deployment, story will be different.
 
 I don't know what's the best, and all I'm saying is that we need to be 
 very careful on what we doing. Before we decide what to do, we need to 
 clearly define the problem we are trying to solve first.
 
 cheng xu wrote:
 Thank you for your prompt reply.
 `Shell command is executed at client side (Beeline) and it doesn't seem 
 making sense to use server specific variables such (env, sys, hiveconf, 
 hivevar) in the shell commands.`
 I'm not sure whether substitution for sh and source is useful. We can 
 enable the support of substitution after connection established for beeline 
 unless connected. For the new CLI who is using an embedded connection, it 
 should be supported WRT the backwards compatibility. 
 
 I am a little confused about `connect to multiple serves at the same 
 time`. Does it mean you can use beeline to connect any server in one 
 connection and you can have multiple beeline instances running? (It's the 
 case that user executes the command 
 */opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service beeline* with specify 
 any hostname) 
 If so, I think it may cause some errors if no connection available since 
 the current implementation is based on connection by using **SetProcessor**. 
 AFAIK, it's safe to get the configurations from HS2 via **SetProcessor** 
 which is what beeline actually did after connection is established. 
 Connection(session) should only be assiocated with one server. If user didn't 
 connect to any HS2, the substitution for *sh* and *source* should be 
 disabled. To be honest, it will have some negative impacts for the 
 performance since it requires to execute set command. WRT the performance, we 
 can make this support configurable.
 
 In summary, substitution is enabled unless connection is established for 
 source or sh command considering the backwards compatibility. And we can 
 disable the support for beeline if not reasonable or brings lower performce. 
 For HIVE-10847, I think we still need one way to access the configuration 
 from server side but it is only needed when start a connection.
 
 Any thoughts?
 
 cheng xu wrote:
 Sorry for below typo.
 I am a little confused about connect to multiple serves at the same time. 
 Does it mean you can use beeline to connect any server in one connection and 
 you can have multiple beeline instances running? (It's the case that user 
 executes the command /opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service 
 beeline **without** specify any hostname)

I think it's possible to start beeline without any connection. To do that, just 
run beeline w/o -u parameter. Once beeline starts, you can run !connect jdbc 
url to make a connection. I also believe it's also possible to make another 
connection using !connect jdbc url w/o disconnecting from the previous 
connection. You can run !list to get a list of connections, and !go index 
to select a particular connection as active. In addition, you 

[jira] [Created] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start

2015-06-10 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-10976:
--

 Summary: Redundant HiveMetaStore connect check in HS2 CLIService 
start
 Key: HIVE-10976
 URL: https://issues.apache.org/jira/browse/HIVE-10976
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Trivial


During HS2 startup, CLIService start() does a HMS connection test to HMS.
It is redundant, since in its init stage, CLIService calls 
applyAuthorizationConfigPolicy where it starts a sessionState and establishes a 
connection to HMS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35107: HIVE-6791 Support variable substition for Beeline shell command

2015-06-10 Thread cheng xu


 On June 9, 2015, 12:41 p.m., Xuefu Zhang wrote:
  Besides the two minor issues I found in the patch, I was wondering if the 
  approach we are taking is the best. Variable substitution is a server (HS2) 
  behavior, and on this ground I think this should happen in HS2 instead of 
  beeline. Please note that JDBC client may also submit queries with $var in 
  it, and such a case should be also supported.
  
  I also noticed that in Driver class, there is code handling variable 
  substitution. I'm wondering why it's not effective.
  
  Shell command (starting with !sh) is executed in the client (Beeline). I 
  think we are fine if variable substituion doesn't work for shell command. 
  We can address that as a followup taks if desirable.
 
 cheng xu wrote:
 Thanks for your comments. 
 
 `I also noticed that in Driver class, there is code handling variable 
 substitution. I'm wondering why it's not effective.`
 
 The substitution works well in HS2 currently.
 There are two reasons for me to add API getting the conf from HS2. One is 
 to support substitution in sh and source command. In the old cli, source 
 command and sh command worked well with substitution. So this part of this 
 patch is addressing this purpose. Another consideration is for 
 https://issues.apache.org/jira/browse/HIVE-10847 which required some 
 configuration from hive-site.xml.
 
 Xuefu Zhang wrote:
 Yeah. It's a little trickier than thought. Shell command is executed at 
 client side (Beeline) and it doesn't seem making sense to use server specific 
 variables such (env, sys, hiveconf, hivevar) in the shell commands. More 
 importantly, Beeline can connect to multiple serves at the same time, so 
 which configurations should be used to substitue the variables? User should 
 be able to execute shell commands w/o any server connection.
 
 For CLI, server and client are together, so these don't matter. But for 
 beeline + HS2 deployment, story will be different.
 
 I don't know what's the best, and all I'm saying is that we need to be 
 very careful on what we doing. Before we decide what to do, we need to 
 clearly define the problem we are trying to solve first.
 
 cheng xu wrote:
 Thank you for your prompt reply.
 `Shell command is executed at client side (Beeline) and it doesn't seem 
 making sense to use server specific variables such (env, sys, hiveconf, 
 hivevar) in the shell commands.`
 I'm not sure whether substitution for sh and source is useful. We can 
 enable the support of substitution after connection established for beeline 
 unless connected. For the new CLI who is using an embedded connection, it 
 should be supported WRT the backwards compatibility. 
 
 I am a little confused about `connect to multiple serves at the same 
 time`. Does it mean you can use beeline to connect any server in one 
 connection and you can have multiple beeline instances running? (It's the 
 case that user executes the command 
 */opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service beeline* with specify 
 any hostname) 
 If so, I think it may cause some errors if no connection available since 
 the current implementation is based on connection by using **SetProcessor**. 
 AFAIK, it's safe to get the configurations from HS2 via **SetProcessor** 
 which is what beeline actually did after connection is established. 
 Connection(session) should only be assiocated with one server. If user didn't 
 connect to any HS2, the substitution for *sh* and *source* should be 
 disabled. To be honest, it will have some negative impacts for the 
 performance since it requires to execute set command. WRT the performance, we 
 can make this support configurable.
 
 In summary, substitution is enabled unless connection is established for 
 source or sh command considering the backwards compatibility. And we can 
 disable the support for beeline if not reasonable or brings lower performce. 
 For HIVE-10847, I think we still need one way to access the configuration 
 from server side but it is only needed when start a connection.
 
 Any thoughts?
 
 cheng xu wrote:
 Sorry for below typo.
 I am a little confused about connect to multiple serves at the same time. 
 Does it mean you can use beeline to connect any server in one connection and 
 you can have multiple beeline instances running? (It's the case that user 
 executes the command /opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service 
 beeline **without** specify any hostname)
 
 Xuefu Zhang wrote:
 I think it's possible to start beeline without any connection. To do 
 that, just run beeline w/o -u parameter. Once beeline starts, you can run 
 !connect jdbc url to make a connection. I also believe it's also possible 
 to make another connection using !connect jdbc url w/o disconnecting from 
 the previous connection. You can run !list to get a list of connections, 
 and !go index to select a particular 

Re: Review Request 35107: HIVE-6791 Support variable substition for Beeline shell command

2015-06-10 Thread Xuefu Zhang


 On June 9, 2015, 4:41 a.m., Xuefu Zhang wrote:
  Besides the two minor issues I found in the patch, I was wondering if the 
  approach we are taking is the best. Variable substitution is a server (HS2) 
  behavior, and on this ground I think this should happen in HS2 instead of 
  beeline. Please note that JDBC client may also submit queries with $var in 
  it, and such a case should be also supported.
  
  I also noticed that in Driver class, there is code handling variable 
  substitution. I'm wondering why it's not effective.
  
  Shell command (starting with !sh) is executed in the client (Beeline). I 
  think we are fine if variable substituion doesn't work for shell command. 
  We can address that as a followup taks if desirable.
 
 cheng xu wrote:
 Thanks for your comments. 
 
 `I also noticed that in Driver class, there is code handling variable 
 substitution. I'm wondering why it's not effective.`
 
 The substitution works well in HS2 currently.
 There are two reasons for me to add API getting the conf from HS2. One is 
 to support substitution in sh and source command. In the old cli, source 
 command and sh command worked well with substitution. So this part of this 
 patch is addressing this purpose. Another consideration is for 
 https://issues.apache.org/jira/browse/HIVE-10847 which required some 
 configuration from hive-site.xml.
 
 Xuefu Zhang wrote:
 Yeah. It's a little trickier than thought. Shell command is executed at 
 client side (Beeline) and it doesn't seem making sense to use server specific 
 variables such (env, sys, hiveconf, hivevar) in the shell commands. More 
 importantly, Beeline can connect to multiple serves at the same time, so 
 which configurations should be used to substitue the variables? User should 
 be able to execute shell commands w/o any server connection.
 
 For CLI, server and client are together, so these don't matter. But for 
 beeline + HS2 deployment, story will be different.
 
 I don't know what's the best, and all I'm saying is that we need to be 
 very careful on what we doing. Before we decide what to do, we need to 
 clearly define the problem we are trying to solve first.
 
 cheng xu wrote:
 Thank you for your prompt reply.
 `Shell command is executed at client side (Beeline) and it doesn't seem 
 making sense to use server specific variables such (env, sys, hiveconf, 
 hivevar) in the shell commands.`
 I'm not sure whether substitution for sh and source is useful. We can 
 enable the support of substitution after connection established for beeline 
 unless connected. For the new CLI who is using an embedded connection, it 
 should be supported WRT the backwards compatibility. 
 
 I am a little confused about `connect to multiple serves at the same 
 time`. Does it mean you can use beeline to connect any server in one 
 connection and you can have multiple beeline instances running? (It's the 
 case that user executes the command 
 */opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service beeline* with specify 
 any hostname) 
 If so, I think it may cause some errors if no connection available since 
 the current implementation is based on connection by using **SetProcessor**. 
 AFAIK, it's safe to get the configurations from HS2 via **SetProcessor** 
 which is what beeline actually did after connection is established. 
 Connection(session) should only be assiocated with one server. If user didn't 
 connect to any HS2, the substitution for *sh* and *source* should be 
 disabled. To be honest, it will have some negative impacts for the 
 performance since it requires to execute set command. WRT the performance, we 
 can make this support configurable.
 
 In summary, substitution is enabled unless connection is established for 
 source or sh command considering the backwards compatibility. And we can 
 disable the support for beeline if not reasonable or brings lower performce. 
 For HIVE-10847, I think we still need one way to access the configuration 
 from server side but it is only needed when start a connection.
 
 Any thoughts?
 
 cheng xu wrote:
 Sorry for below typo.
 I am a little confused about connect to multiple serves at the same time. 
 Does it mean you can use beeline to connect any server in one connection and 
 you can have multiple beeline instances running? (It's the case that user 
 executes the command /opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service 
 beeline **without** specify any hostname)
 
 Xuefu Zhang wrote:
 I think it's possible to start beeline without any connection. To do 
 that, just run beeline w/o -u parameter. Once beeline starts, you can run 
 !connect jdbc url to make a connection. I also believe it's also possible 
 to make another connection using !connect jdbc url w/o disconnecting from 
 the previous connection. You can run !list to get a list of connections, 
 and !go index to select a particular 

Hive-0.14 - Build # 980 - Failure

2015-06-10 Thread Apache Jenkins Server
Changes for Build #980



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #980)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-0.14/980/ to view 
the results.

Re: Review Request 35323: HIVE-10972 DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-10 Thread Damien Carol

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35323/#review87448
---



ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java
https://reviews.apache.org/r/35323/#comment139759

Is this change connected to this bug?



ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
https://reviews.apache.org/r/35323/#comment139758

space is missing after 'key.getName() + '
just before 'with'


- Damien Carol


On juin 10, 2015, 8:35 après-midi, Aihua Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35323/
 ---
 
 (Updated juin 10, 2015, 8:35 après-midi)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-10972 DummyTxnManager always locks the current database in shared mode, 
 which is incorrect.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
 ea04415bc0054e8689a118d2d726ede2ad1e3b80 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 
 e75a90a449ef7db95f5808302607fa5f04c9b8c8 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
  4f86dd9aa539ebdbd2412d4cbcf857dd3e614858 
   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java 
 5abb729fe18d49591857c4ede2d9307274448c62 
 
 Diff: https://reviews.apache.org/r/35323/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aihua Xu
 




Review Request 35323: HIVE-10972 DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-10 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35323/
---

Review request for hive.


Repository: hive-git


Description
---

HIVE-10972 DummyTxnManager always locks the current database in shared mode, 
which is incorrect.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
ea04415bc0054e8689a118d2d726ede2ad1e3b80 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 
e75a90a449ef7db95f5808302607fa5f04c9b8c8 
  
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 4f86dd9aa539ebdbd2412d4cbcf857dd3e614858 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java 
5abb729fe18d49591857c4ede2d9307274448c62 

Diff: https://reviews.apache.org/r/35323/diff/


Testing
---


Thanks,

Aihua Xu



[jira] [Created] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled

2015-06-10 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-10977:
--

 Summary: No need to instantiate MetaStoreDirectSql when HMS 
DirectSql is disabled
 Key: HIVE-10977
 URL: https://issues.apache.org/jira/browse/HIVE-10977
 Project: Hive
  Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
Priority: Minor


When hive.metastore.try.direct.sql is set to false, HMS will use JDO to 
retrieve data, therefor it is not necessary to instantiate an expensive 
MetaStoreDirectSql during ObjectStore initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 35323: HIVE-10972 DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-10 Thread Aihua Xu


 On June 10, 2015, 8:44 p.m., Damien Carol wrote:
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java, line 54
  https://reviews.apache.org/r/35323/diff/1/?file=982259#file982259line54
 
  Is this change connected to this bug?

Not directly related to the bug but I found such issue when I created the new 
unit test which will throw NPE. Do you suggest to fix saprately?


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35323/#review87448
---


On June 10, 2015, 8:35 p.m., Aihua Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35323/
 ---
 
 (Updated June 10, 2015, 8:35 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-10972 DummyTxnManager always locks the current database in shared mode, 
 which is incorrect.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
 ea04415bc0054e8689a118d2d726ede2ad1e3b80 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 
 e75a90a449ef7db95f5808302607fa5f04c9b8c8 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
  4f86dd9aa539ebdbd2412d4cbcf857dd3e614858 
   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java 
 5abb729fe18d49591857c4ede2d9307274448c62 
 
 Diff: https://reviews.apache.org/r/35323/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aihua Xu
 




Re: Review Request 35323: HIVE-10972 DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-10 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35323/
---

(Updated June 10, 2015, 9:08 p.m.)


Review request for hive.


Repository: hive-git


Description
---

HIVE-10972 DummyTxnManager always locks the current database in shared mode, 
which is incorrect.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java ea04415 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java e75a90a 
  
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
 4f86dd9 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java 
5abb729 

Diff: https://reviews.apache.org/r/35323/diff/


Testing
---


Thanks,

Aihua Xu



[ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Carl Steinbach
I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have been
elected to the Hive Project Management Committee. Please join me in
congratulating Chao and Gopal!

Thanks.

- Carl


Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Lefty Leverenz
Kudos, Chao and Gopal!  Thanks for all your contributions.

-- Lefty

On Wed, Jun 10, 2015 at 2:20 PM, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have been
 elected to the Hive Project Management Committee. Please join me in
 congratulating Chao and Gopal!

 Thanks.

 - Carl



Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Jimmy Xiang
Congrats!

On Wed, Jun 10, 2015 at 2:43 PM, Hari Subramaniyan 
hsubramani...@hortonworks.com wrote:

 Congrats Chao and Gopal!
 
 From: Lefty Leverenz leftylever...@gmail.com
 Sent: Wednesday, June 10, 2015 2:22 PM
 To: dev@hive.apache.org
 Subject: Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal
 Vijayaraghavan

 Kudos, Chao and Gopal!  Thanks for all your contributions.

 -- Lefty

 On Wed, Jun 10, 2015 at 2:20 PM, Carl Steinbach c...@apache.org wrote:

  I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have been
  elected to the Hive Project Management Committee. Please join me in
  congratulating Chao and Gopal!
 
  Thanks.
 
  - Carl
 



Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Hari Subramaniyan
Congrats Chao and Gopal!

From: Lefty Leverenz leftylever...@gmail.com
Sent: Wednesday, June 10, 2015 2:22 PM
To: dev@hive.apache.org
Subject: Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

Kudos, Chao and Gopal!  Thanks for all your contributions.

-- Lefty

On Wed, Jun 10, 2015 at 2:20 PM, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have been
 elected to the Hive Project Management Committee. Please join me in
 congratulating Chao and Gopal!

 Thanks.

 - Carl



Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Sergio Pena
Congratulations Chao and Gopal !!!

On Wed, Jun 10, 2015 at 4:20 PM, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have been
 elected to the Hive Project Management Committee. Please join me in
 congratulating Chao and Gopal!

 Thanks.

 - Carl



Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Chaoyu Tang
Congratulations to all!

On Wed, Jun 10, 2015 at 5:53 PM, Vaibhav Gumashta vgumas...@hortonworks.com
 wrote:

 Congrats guys.

 ‹Vaibhav

 On 6/10/15, 2:51 PM, Sergio Pena sergio.p...@cloudera.com wrote:

 Congratulations Chao and Gopal !!!
 
 On Wed, Jun 10, 2015 at 4:20 PM, Carl Steinbach c...@apache.org wrote:
 
  I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have
 been
  elected to the Hive Project Management Committee. Please join me in
  congratulating Chao and Gopal!
 
  Thanks.
 
  - Carl
 




Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Gopal Vijayaraghavan
Hi,

Thanks everyone and Congratulations to Chao!

Cheers,
Gopal

On 6/10/15, 2:20 PM, Carl Steinbach c...@apache.org wrote:

I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have been
elected to the Hive Project Management Committee. Please join me in
congratulating Chao and Gopal!

Thanks.

- Carl




Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Lenni Kuff
Congratulation!

On Wed, Jun 10, 2015 at 2:44 PM, Jimmy Xiang jxi...@cloudera.com wrote:

 Congrats!

 On Wed, Jun 10, 2015 at 2:43 PM, Hari Subramaniyan 
 hsubramani...@hortonworks.com wrote:

  Congrats Chao and Gopal!
  
  From: Lefty Leverenz leftylever...@gmail.com
  Sent: Wednesday, June 10, 2015 2:22 PM
  To: dev@hive.apache.org
  Subject: Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal
  Vijayaraghavan
 
  Kudos, Chao and Gopal!  Thanks for all your contributions.
 
  -- Lefty
 
  On Wed, Jun 10, 2015 at 2:20 PM, Carl Steinbach c...@apache.org wrote:
 
   I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have
 been
   elected to the Hive Project Management Committee. Please join me in
   congratulating Chao and Gopal!
  
   Thanks.
  
   - Carl
  
 



Re: Review Request 35107: HIVE-6791 Support variable substition for Beeline shell command

2015-06-10 Thread Lefty Leverenz


 On June 8, 2015, 9:41 p.m., Xuefu Zhang wrote:
  Besides the two minor issues I found in the patch, I was wondering if the 
  approach we are taking is the best. Variable substitution is a server (HS2) 
  behavior, and on this ground I think this should happen in HS2 instead of 
  beeline. Please note that JDBC client may also submit queries with $var in 
  it, and such a case should be also supported.
  
  I also noticed that in Driver class, there is code handling variable 
  substitution. I'm wondering why it's not effective.
  
  Shell command (starting with !sh) is executed in the client (Beeline). I 
  think we are fine if variable substituion doesn't work for shell command. 
  We can address that as a followup taks if desirable.
 
 cheng xu wrote:
 Thanks for your comments. 
 
 `I also noticed that in Driver class, there is code handling variable 
 substitution. I'm wondering why it's not effective.`
 
 The substitution works well in HS2 currently.
 There are two reasons for me to add API getting the conf from HS2. One is 
 to support substitution in sh and source command. In the old cli, source 
 command and sh command worked well with substitution. So this part of this 
 patch is addressing this purpose. Another consideration is for 
 https://issues.apache.org/jira/browse/HIVE-10847 which required some 
 configuration from hive-site.xml.
 
 Xuefu Zhang wrote:
 Yeah. It's a little trickier than thought. Shell command is executed at 
 client side (Beeline) and it doesn't seem making sense to use server specific 
 variables such (env, sys, hiveconf, hivevar) in the shell commands. More 
 importantly, Beeline can connect to multiple serves at the same time, so 
 which configurations should be used to substitue the variables? User should 
 be able to execute shell commands w/o any server connection.
 
 For CLI, server and client are together, so these don't matter. But for 
 beeline + HS2 deployment, story will be different.
 
 I don't know what's the best, and all I'm saying is that we need to be 
 very careful on what we doing. Before we decide what to do, we need to 
 clearly define the problem we are trying to solve first.
 
 cheng xu wrote:
 Thank you for your prompt reply.
 `Shell command is executed at client side (Beeline) and it doesn't seem 
 making sense to use server specific variables such (env, sys, hiveconf, 
 hivevar) in the shell commands.`
 I'm not sure whether substitution for sh and source is useful. We can 
 enable the support of substitution after connection established for beeline 
 unless connected. For the new CLI who is using an embedded connection, it 
 should be supported WRT the backwards compatibility. 
 
 I am a little confused about `connect to multiple serves at the same 
 time`. Does it mean you can use beeline to connect any server in one 
 connection and you can have multiple beeline instances running? (It's the 
 case that user executes the command 
 */opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service beeline* with specify 
 any hostname) 
 If so, I think it may cause some errors if no connection available since 
 the current implementation is based on connection by using **SetProcessor**. 
 AFAIK, it's safe to get the configurations from HS2 via **SetProcessor** 
 which is what beeline actually did after connection is established. 
 Connection(session) should only be assiocated with one server. If user didn't 
 connect to any HS2, the substitution for *sh* and *source* should be 
 disabled. To be honest, it will have some negative impacts for the 
 performance since it requires to execute set command. WRT the performance, we 
 can make this support configurable.
 
 In summary, substitution is enabled unless connection is established for 
 source or sh command considering the backwards compatibility. And we can 
 disable the support for beeline if not reasonable or brings lower performce. 
 For HIVE-10847, I think we still need one way to access the configuration 
 from server side but it is only needed when start a connection.
 
 Any thoughts?
 
 cheng xu wrote:
 Sorry for below typo.
 I am a little confused about connect to multiple serves at the same time. 
 Does it mean you can use beeline to connect any server in one connection and 
 you can have multiple beeline instances running? (It's the case that user 
 executes the command /opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service 
 beeline **without** specify any hostname)
 
 Xuefu Zhang wrote:
 I think it's possible to start beeline without any connection. To do 
 that, just run beeline w/o -u parameter. Once beeline starts, you can run 
 !connect jdbc url to make a connection. I also believe it's also possible 
 to make another connection using !connect jdbc url w/o disconnecting from 
 the previous connection. You can run !list to get a list of connections, 
 and !go index to select a particular 

Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Prasanth Jayachandran
Congratulations!!

Thanks
Prasanth




On Wed, Jun 10, 2015 at 2:53 PM -0700, Vaibhav Gumashta 
vgumas...@hortonworks.commailto:vgumas...@hortonworks.com wrote:

Congrats guys.

‹Vaibhav

On 6/10/15, 2:51 PM, Sergio Pena sergio.p...@cloudera.com wrote:

Congratulations Chao and Gopal !!!

On Wed, Jun 10, 2015 at 4:20 PM, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have
been
 elected to the Hive Project Management Committee. Please join me in
 congratulating Chao and Gopal!

 Thanks.

 - Carl




Re: Revise docs for Hive indexing

2015-06-10 Thread Lefty Leverenz
Any more input about the documentation for Hive indexing?

-- Lefty

On Thu, May 28, 2015 at 10:26 AM, John Pullokkaran 
jpullokka...@hortonworks.com wrote:

 GB in some special cases does use indexes.

 Thanks
 John


 On 5/27/15, 10:31 PM, Lefty Leverenz leftylever...@gmail.com wrote:

 Will Hive indexing ever be fixed?  If not, should we remove the doc I
 cobbled together (Indexing
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Indexing
 
 )
 or just revise it?  And should the design doc be moved from the Completed
 section to Incomplete (Hive Design Docs
 https://cwiki.apache.org/confluence/display/Hive/DesignDocs)?
 
 What about bitmap indexes, do they work (Bitmap Indexes
 https://cwiki.apache.org/confluence/display/Hive/IndexDev+Bitmap --
 HIVE-1803 https://issues.apache.org/jira/browse/HIVE-1803)?
 
 -- Lefty




Re: Review Request 35323: HIVE-10972 DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-10 Thread Aihua Xu


 On June 10, 2015, 8:44 p.m., Damien Carol wrote:
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java, line 54
  https://reviews.apache.org/r/35323/diff/1/?file=982259#file982259line54
 
  Is this change connected to this bug?
 
 Aihua Xu wrote:
 Not directly related to the bug but I found such issue when I created the 
 new unit test which will throw NPE. Do you suggest to fix saprately?

I will still keep it since the unit test detects it.


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35323/#review87448
---


On June 10, 2015, 8:35 p.m., Aihua Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35323/
 ---
 
 (Updated June 10, 2015, 8:35 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-10972 DummyTxnManager always locks the current database in shared mode, 
 which is incorrect.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
 ea04415bc0054e8689a118d2d726ede2ad1e3b80 
   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 
 e75a90a449ef7db95f5808302607fa5f04c9b8c8 
   
 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
  4f86dd9aa539ebdbd2412d4cbcf857dd3e614858 
   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java 
 5abb729fe18d49591857c4ede2d9307274448c62 
 
 Diff: https://reviews.apache.org/r/35323/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aihua Xu
 




Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Vaibhav Gumashta
Congrats guys.

‹Vaibhav

On 6/10/15, 2:51 PM, Sergio Pena sergio.p...@cloudera.com wrote:

Congratulations Chao and Gopal !!!

On Wed, Jun 10, 2015 at 4:20 PM, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have
been
 elected to the Hive Project Management Committee. Please join me in
 congratulating Chao and Gopal!

 Thanks.

 - Carl




Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Alexander Pivovarov
Congratulations to both of you!!!

On Wed, Jun 10, 2015 at 3:26 PM, Gopal Vijayaraghavan gop...@apache.org
wrote:

 Hi,

 Thanks everyone and Congratulations to Chao!

 Cheers,
 Gopal

 On 6/10/15, 2:20 PM, Carl Steinbach c...@apache.org wrote:

 I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have been
 elected to the Hive Project Management Committee. Please join me in
 congratulating Chao and Gopal!
 
 Thanks.
 
 - Carl





[jira] [Created] (HIVE-10978) Document fs.trash.interval wrt Hive and HDFS Encryption

2015-06-10 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-10978:
-

 Summary: Document fs.trash.interval wrt Hive and HDFS Encryption
 Key: HIVE-10978
 URL: https://issues.apache.org/jira/browse/HIVE-10978
 Project: Hive
  Issue Type: Bug
  Components: Documentation, Security
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Priority: Critical


When HDFS is encrypted (TDE is enabled), DROP TABLE and DROP PARTITION have 
unexpected behavior when Hadoop Trash feature is enabled.
The later is enabled by setting fs.trash.interval  0 in core-site.xml.
When Trash is enabled, the data file for the table, should be moved to Trash 
bin. If the table is inside an Encryption Zone, this move operation is not 
allowed.
There are 2 ways to deal with this:
1. use PURGE, as in DROP TABLE blah PURGE. This skips the Trash bin even if 
enabled.
2. set fs.trash.interval = 0. It is critical that this config change is done in 
core-site.xml. Setting it in hive-site.xml may lead to very strange behavior 
where the table metadata is deleted but the data file remains.  This will lead 
to data corruption if a table with the same name is later created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Chao Sun
Thanks Lenni!

On Wed, Jun 10, 2015 at 2:46 PM, Lenni Kuff lsk...@cloudera.com wrote:

 Congratulation!

 On Wed, Jun 10, 2015 at 2:44 PM, Jimmy Xiang jxi...@cloudera.com wrote:

  Congrats!
 
  On Wed, Jun 10, 2015 at 2:43 PM, Hari Subramaniyan 
  hsubramani...@hortonworks.com wrote:
 
   Congrats Chao and Gopal!
   
   From: Lefty Leverenz leftylever...@gmail.com
   Sent: Wednesday, June 10, 2015 2:22 PM
   To: dev@hive.apache.org
   Subject: Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal
   Vijayaraghavan
  
   Kudos, Chao and Gopal!  Thanks for all your contributions.
  
   -- Lefty
  
   On Wed, Jun 10, 2015 at 2:20 PM, Carl Steinbach c...@apache.org
 wrote:
  
I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have
  been
elected to the Hive Project Management Committee. Please join me in
congratulating Chao and Gopal!
   
Thanks.
   
- Carl
   
  
 



Hosting Hive User Group Meeting During Hadoop World NY

2015-06-10 Thread Xuefu Zhang
Dear Hive users,

Hive community is considering a user group meeting during Hadoop World that
will be held in New York at the end of September. To make this happen, your
support is essential. First, I'm wondering if any user in New York area
would be willing to host the meetup. Secondly, I'm soliciting talks from
users as well as developers, and so please propose or share your thoughts
on the contents of the meetup.

I will soon set up a meetup event to  formally announce this. In the
meantime, your suggestions, comments, and kind assistance are greatly
appreciated.

Sincerely,
 Xuefu


Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Naveen Gangam
Congrats Chao and Gopal.

On Wed, Jun 10, 2015 at 7:53 PM, Chao Sun c...@cloudera.com wrote:

 Thanks everyone! I feel honored! Also congratulations to Gopal!

 Best,
 Chao

 On Wed, Jun 10, 2015 at 4:01 PM, Alexander Pivovarov apivova...@gmail.com
 
 wrote:

  Congratulations to both of you!!!
 
  On Wed, Jun 10, 2015 at 3:26 PM, Gopal Vijayaraghavan gop...@apache.org
 
  wrote:
 
   Hi,
  
   Thanks everyone and Congratulations to Chao!
  
   Cheers,
   Gopal
  
   On 6/10/15, 2:20 PM, Carl Steinbach c...@apache.org wrote:
  
   I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have
  been
   elected to the Hive Project Management Committee. Please join me in
   congratulating Chao and Gopal!
   
   Thanks.
   
   - Carl
  
  
  
 



Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Chao Sun
Thanks everyone! I feel honored! Also congratulations to Gopal!

Best,
Chao

On Wed, Jun 10, 2015 at 4:01 PM, Alexander Pivovarov apivova...@gmail.com
wrote:

 Congratulations to both of you!!!

 On Wed, Jun 10, 2015 at 3:26 PM, Gopal Vijayaraghavan gop...@apache.org
 wrote:

  Hi,
 
  Thanks everyone and Congratulations to Chao!
 
  Cheers,
  Gopal
 
  On 6/10/15, 2:20 PM, Carl Steinbach c...@apache.org wrote:
 
  I am pleased to announce that Chao Sun and Gopal Vijayaraghavan have
 been
  elected to the Hive Project Management Committee. Please join me in
  congratulating Chao and Gopal!
  
  Thanks.
  
  - Carl
 
 
 



RE: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

2015-06-10 Thread Xu, Cheng A
Congrats!

-Original Message-
From: Naveen Gangam [mailto:ngan...@cloudera.com] 
Sent: Thursday, June 11, 2015 8:30 AM
To: dev@hive.apache.org
Subject: Re: [ANNOUNCE] New Hive PMC Members - Chao Sun and Gopal Vijayaraghavan

Congrats Chao and Gopal.

On Wed, Jun 10, 2015 at 7:53 PM, Chao Sun c...@cloudera.com wrote:

 Thanks everyone! I feel honored! Also congratulations to Gopal!

 Best,
 Chao

 On Wed, Jun 10, 2015 at 4:01 PM, Alexander Pivovarov 
 apivova...@gmail.com
 
 wrote:

  Congratulations to both of you!!!
 
  On Wed, Jun 10, 2015 at 3:26 PM, Gopal Vijayaraghavan 
  gop...@apache.org
 
  wrote:
 
   Hi,
  
   Thanks everyone and Congratulations to Chao!
  
   Cheers,
   Gopal
  
   On 6/10/15, 2:20 PM, Carl Steinbach c...@apache.org wrote:
  
   I am pleased to announce that Chao Sun and Gopal Vijayaraghavan 
   have
  been
   elected to the Hive Project Management Committee. Please join me 
   in congratulating Chao and Gopal!
   
   Thanks.
   
   - Carl
  
  
  
 



Re: Review Request 35107: HIVE-6791 Support variable substition for Beeline shell command

2015-06-10 Thread cheng xu


 On June 9, 2015, 12:41 p.m., Xuefu Zhang wrote:
  Besides the two minor issues I found in the patch, I was wondering if the 
  approach we are taking is the best. Variable substitution is a server (HS2) 
  behavior, and on this ground I think this should happen in HS2 instead of 
  beeline. Please note that JDBC client may also submit queries with $var in 
  it, and such a case should be also supported.
  
  I also noticed that in Driver class, there is code handling variable 
  substitution. I'm wondering why it's not effective.
  
  Shell command (starting with !sh) is executed in the client (Beeline). I 
  think we are fine if variable substituion doesn't work for shell command. 
  We can address that as a followup taks if desirable.
 
 cheng xu wrote:
 Thanks for your comments. 
 
 `I also noticed that in Driver class, there is code handling variable 
 substitution. I'm wondering why it's not effective.`
 
 The substitution works well in HS2 currently.
 There are two reasons for me to add API getting the conf from HS2. One is 
 to support substitution in sh and source command. In the old cli, source 
 command and sh command worked well with substitution. So this part of this 
 patch is addressing this purpose. Another consideration is for 
 https://issues.apache.org/jira/browse/HIVE-10847 which required some 
 configuration from hive-site.xml.
 
 Xuefu Zhang wrote:
 Yeah. It's a little trickier than thought. Shell command is executed at 
 client side (Beeline) and it doesn't seem making sense to use server specific 
 variables such (env, sys, hiveconf, hivevar) in the shell commands. More 
 importantly, Beeline can connect to multiple serves at the same time, so 
 which configurations should be used to substitue the variables? User should 
 be able to execute shell commands w/o any server connection.
 
 For CLI, server and client are together, so these don't matter. But for 
 beeline + HS2 deployment, story will be different.
 
 I don't know what's the best, and all I'm saying is that we need to be 
 very careful on what we doing. Before we decide what to do, we need to 
 clearly define the problem we are trying to solve first.
 
 cheng xu wrote:
 Thank you for your prompt reply.
 `Shell command is executed at client side (Beeline) and it doesn't seem 
 making sense to use server specific variables such (env, sys, hiveconf, 
 hivevar) in the shell commands.`
 I'm not sure whether substitution for sh and source is useful. We can 
 enable the support of substitution after connection established for beeline 
 unless connected. For the new CLI who is using an embedded connection, it 
 should be supported WRT the backwards compatibility. 
 
 I am a little confused about `connect to multiple serves at the same 
 time`. Does it mean you can use beeline to connect any server in one 
 connection and you can have multiple beeline instances running? (It's the 
 case that user executes the command 
 */opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service beeline* with specify 
 any hostname) 
 If so, I think it may cause some errors if no connection available since 
 the current implementation is based on connection by using **SetProcessor**. 
 AFAIK, it's safe to get the configurations from HS2 via **SetProcessor** 
 which is what beeline actually did after connection is established. 
 Connection(session) should only be assiocated with one server. If user didn't 
 connect to any HS2, the substitution for *sh* and *source* should be 
 disabled. To be honest, it will have some negative impacts for the 
 performance since it requires to execute set command. WRT the performance, we 
 can make this support configurable.
 
 In summary, substitution is enabled unless connection is established for 
 source or sh command considering the backwards compatibility. And we can 
 disable the support for beeline if not reasonable or brings lower performce. 
 For HIVE-10847, I think we still need one way to access the configuration 
 from server side but it is only needed when start a connection.
 
 Any thoughts?
 
 cheng xu wrote:
 Sorry for below typo.
 I am a little confused about connect to multiple serves at the same time. 
 Does it mean you can use beeline to connect any server in one connection and 
 you can have multiple beeline instances running? (It's the case that user 
 executes the command /opt/apache-hive-1.2.0-SNAPSHOT-bin/bin/hive --service 
 beeline **without** specify any hostname)
 
 Xuefu Zhang wrote:
 I think it's possible to start beeline without any connection. To do 
 that, just run beeline w/o -u parameter. Once beeline starts, you can run 
 !connect jdbc url to make a connection. I also believe it's also possible 
 to make another connection using !connect jdbc url w/o disconnecting from 
 the previous connection. You can run !list to get a list of connections, 
 and !go index to select a particular 

[jira] [Created] (HIVE-10980) Merge of dynamic partitions loads all data to default partition

2015-06-10 Thread Illya Yalovyy (JIRA)
Illya Yalovyy created HIVE-10980:


 Summary: Merge of dynamic partitions loads all data to default 
partition
 Key: HIVE-10980
 URL: https://issues.apache.org/jira/browse/HIVE-10980
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
 Environment: HDP 2.2.4 (also reproduced on apache hive built from 
trunk) 
Reporter: Illya Yalovyy


Conditions that lead to the issue:
1. Partition columns have different types
2. Both static and dynamic partitions are used in the query
3. Dynamically generated partitions require merge

Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__.

Steps to reproduce:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=strict;
set hive.optimize.sort.dynamic.partition=false;
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

create external table sdp (
  dataint bigint,
  hour int,
  req string,
  cid string,
  caid string
)
row format delimited
fields terminated by ',';

load data local inpath '../../data/files/dynpartdata1.txt' into table sdp;
load data local inpath '../../data/files/dynpartdata2.txt' into table sdp;
...
load data local inpath '../../data/files/dynpartdataN.txt' into table sdp;

create table tdp (cid string, caid string)
partitioned by (dataint bigint, hour int, req string);

insert overwrite table tdp partition (dataint=20150316, hour=16, req)
select cid, caid, req from sdp where dataint=20150316 and hour=16;

select * from tdp order by caid;
show partitions tdp;

Example of the input file:
20150316,16,reqA,clusterIdA,cacheId1
20150316,16,reqB,clusterIdB,cacheId2 
20150316,16,reqA,clusterIdC,cacheId3  
20150316,16,reqD,clusterIdD,cacheId4
20150316,16,reqA,clusterIdA,cacheId5  

Actual result:
clusterIdA  cacheId12015031616  
__HIVE_DEFAULT_PARTITION__ 
clusterIdA  cacheId12015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdB  cacheId22015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdC  cacheId32015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdD  cacheId42015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdA  cacheId52015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdD  cacheId82015031616  
__HIVE_DEFAULT_PARTITION__
clusterIdB  cacheId92015031616  
__HIVE_DEFAULT_PARTITION__  
  
dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Full text index functionality available?

2015-06-10 Thread Jalaj Thanaki
Hi Developing  Users Community,

My Name is Jalaj. I’m using Hive

I want to ask you ,Is there full text index kind of functionality available
in Hive for searching, matching  ranking  keywords of the text? If any
functionality will be there then let me know.

Thank,
Jalaj Thanaki.


[jira] [Created] (HIVE-10979) Fix failed tests in TestSchemaTool after the version number change in HIVE-10921

2015-06-10 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10979:
---

 Summary: Fix failed tests in TestSchemaTool after the version 
number change in HIVE-10921
 Key: HIVE-10979
 URL: https://issues.apache.org/jira/browse/HIVE-10979
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Some version variables in sql are not updated in HIVE-10921 which caused unit 
test failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 34586: HIVE-10704

2015-06-10 Thread Jason Dere


 On May 30, 2015, 2:57 a.m., Alexander Pivovarov wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java, line 
  201
  https://reviews.apache.org/r/34586/diff/2/?file=972516#file972516line201
 
  The comment above says - if any table has bad size estimate
  But why you check totalSize = 0 then?
  Should you iterate over all small tables and check that they all have 
  good size estimate.
  
  What if you have table sizes (100, -4, 0)
  totalSize is 96. But table #2 size is -4, which is bad size.
  
  To make code clear I recommend to add new boolean variable 
  isAnyTableHasBadSize and set its value it in the place where you calc 
  totalSize, biggest and maxSize

The logic here does still check each table individually to make sure that the 
table has valid size (lines 201-214). It just uses the initial check (totalSize 
= 0) to see whether iterating over the tables is even necessary, if the size 
is non-positive we don't even have to bother checking each table and we will 
automatically fallback to equal proportions.


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34586/#review85853
---


On May 27, 2015, 6:33 a.m., Mostafa Mokhtar wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34586/
 ---
 
 (Updated May 27, 2015, 6:33 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 fix biggest small table selection when table sizes are 0
 fallback to dividing memory equally if any tables have invalid size
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 536b92c 
 
 Diff: https://reviews.apache.org/r/34586/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mostafa Mokhtar
 




does hive cli support concurrency?

2015-06-10 Thread Xiaomeng Huang
Hi, I found a problem when I use hive.

1. In my environment, I have many threads, and every thread fork a process
to run hive job using hive -e sql(not use hiveserver2, just use hive cli).
Accidentally, one hive process couldn't close when the job is fininshed. I
see the job in yarn has finished, even I see OK in hive log. But the
process of hive named runjar is still running.
When this issue occurs, I can only use kill -9 to terminate this process.

2. I don't set the configuration hive.support.concurrency, but I set the
configuration hive.exec.parallel be true.

3. My hive version is hive-0.12.0-cdh5.1.0.

4. This is the jstack of the hive process when it hang
$ cat 721783-jstack.txt
2015-06-10 19:59:30
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode):

Attach Listener daemon prio=10 tid=0x7f0ed403b800 nid=0x5d08 waiting
on condition [0x]
   java.lang.Thread.State: RUNNABLE

org.apache.hadoop.hdfs.PeerCache@108e08dc daemon prio=10
tid=0x7f0ee8f1c800 nid=0x6188 waiting on condition [0x7f0ed89ba000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:245)
at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:41)
at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:119)
at java.lang.Thread.run(Thread.java:745)

Thread-24 daemon prio=10 tid=0x7f0ee8d52800 nid=0x6185 runnable
[0x7f0ed95e7000]
   java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
at
org.apache.hadoop.net.unix.DomainSocketWatcher.access$800(DomainSocketWatcher.java:52)
at
org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:457)
at java.lang.Thread.run(Thread.java:745)

Abandoned connection cleanup thread daemon prio=10 tid=0x7f0ee9148000
nid=0x6180 in Object.wait() [0x7f0ed96ea000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked 0xe003f0a8 (a java.lang.ref.ReferenceQueue$Lock)
at
com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:41)

Service Thread daemon prio=10 tid=0x7f0ee80cc000 nid=0x616e runnable
[0x]
   java.lang.Thread.State: RUNNABLE

C2 CompilerThread1 daemon prio=10 tid=0x7f0ee80c9800 nid=0x616d
waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

C2 CompilerThread0 daemon prio=10 tid=0x7f0ee80c6800 nid=0x616c
waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

Signal Dispatcher daemon prio=10 tid=0x7f0ee80c5000 nid=0x616b
runnable [0x]
   java.lang.Thread.State: RUNNABLE

Finalizer daemon prio=10 tid=0x7f0ee809d800 nid=0x6149 in
Object.wait() [0x7f0eda9c7000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xe003f2e8 (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked 0xe003f2e8 (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

Reference Handler daemon prio=10 tid=0x7f0ee8096000 nid=0x6148 in
Object.wait() [0x7f0edaac8000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xe0041870 (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked 0xe0041870 (a java.lang.ref.Reference$Lock)

main prio=10 tid=0x7f0ee8013800 nid=0x6134 runnable
[0x7f0eecc86000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap.put(HashMap.java:494)
at org.apache.hadoop.hive.ql.log.PerfLogger.PerfLogBegin(PerfLogger.java:98)
at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:909)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1099)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 

Re: Review Request 34586: HIVE-10704

2015-06-10 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34586/#review87522
---


- Jason Dere


On May 27, 2015, 6:33 a.m., Mostafa Mokhtar wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34586/
 ---
 
 (Updated May 27, 2015, 6:33 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 fix biggest small table selection when table sizes are 0
 fallback to dividing memory equally if any tables have invalid size
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 536b92c 
 
 Diff: https://reviews.apache.org/r/34586/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mostafa Mokhtar