[jira] [Created] (HIVE-6865) Failed to load data into Hive from Pig using HCatStorer()
Bing Li created HIVE-6865: - Summary: Failed to load data into Hive from Pig using HCatStorer() Key: HIVE-6865 URL: https://issues.apache.org/jira/browse/HIVE-6865 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Bing Li Assignee: Bing Li Reproduce steps: 1. create a hive table hive create table t1 (c1 int, c2 int, c3 int); 2. start pig shell grunt register $HIVE_HOME/lib/*.jar grunt register $HIVE_HOME/hcatalog/share/hcatalog/*.jar grunt A = load 'pig.txt' as (c1:int, c2:int, c3:int) grunt store A into 't1' using org.apache.hive.hcatalog.HCatSrorer(); Error Message: ERROR [main] org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.hcatalog.common.HCatException : 2004 : HCatOutputFormat not initialized, setOutput has to be called at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:111) at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:97) at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:85) at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:75) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1000) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:963) at java.security.AccessController.doPrivileged(AccessController.java:310) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:963) at org.apache.hadoop.mapreduce.Job.submit(Job.java:616) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128) at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191) at java.lang.Thread.run(Thread.java:738) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6831) The job schedule in condition task could not be correct with skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962630#comment-13962630 ] william zhu commented on HIVE-6831: --- 1. The query can be simplied like this: Select * from (select * from TableA Union all select * from TableB) a 2. TableA and TableB is consists of other select query. 3. And I set the hive parameter is : set hive.auto.convert.join=false; set hive.optimize.skewjoin = true; set hive.skewjoin.key = 50; set hive.mapjoin.smalltable.filesize=5000; The job schedule in condition task could not be correct with skewed join optimization - Key: HIVE-6831 URL: https://issues.apache.org/jira/browse/HIVE-6831 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: Hive 0.11.0 Reporter: william zhu Attachments: 6831.patch Code snippet in ConditionalTask.java as bellow: // resolved task if (driverContext.addToRunnable(tsk)) { console.printInfo(tsk.getId() + is selected by condition resolver.); } The selected task is added into the runnable queue immediately without any dependency checking. If the selected task is original task ,and its parent task is not being executed, then the result will be incorrect. Like this: 1. Before skew join optimization: Step1 ,Step 2 -- step 3 ( Step1 and Step2 is Step 3's parent) 2. after skew join optimization: Step1 - Step4 (ConditionTask)- consists of [Step3,Step10] Step2 - Step5 (ConditionTask)- consists of [Step3,Step11] 3. Runing Step3 is selected in Step4 and Step5 Step3 will be execute immediately after Step4 , its not correct. Step3 will be execute after Step5 again, its not correct either. 4. The correct scheduler is that step3 will be execute after step4 and step5. 5. So, I add a checking operate in the snippet as bellow: if (!driverContext.getRunnable().contains(tsk)) { console.printInfo(tsk.getId() + is selected by condition resolver.); if(DriverContext.isLaunchable(tsk)){ driverContext.addToRunnable(tsk); } } So , that is work right for me in my enviroment. I am not sure whether it will has some problems in someother condition. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6831) The job schedule in condition task could not be correct with skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962637#comment-13962637 ] william zhu commented on HIVE-6831: --- The problem is ,the union operate will be resolved two steps(A,B). The two steps have one child steps(C) ,C will aggregate A and B steps. And in skew join condition task , c will be selected without any check that its parent steps(A,B) have been completed. The job schedule in condition task could not be correct with skewed join optimization - Key: HIVE-6831 URL: https://issues.apache.org/jira/browse/HIVE-6831 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Environment: Hive 0.11.0 Reporter: william zhu Attachments: 6831.patch Code snippet in ConditionalTask.java as bellow: // resolved task if (driverContext.addToRunnable(tsk)) { console.printInfo(tsk.getId() + is selected by condition resolver.); } The selected task is added into the runnable queue immediately without any dependency checking. If the selected task is original task ,and its parent task is not being executed, then the result will be incorrect. Like this: 1. Before skew join optimization: Step1 ,Step 2 -- step 3 ( Step1 and Step2 is Step 3's parent) 2. after skew join optimization: Step1 - Step4 (ConditionTask)- consists of [Step3,Step10] Step2 - Step5 (ConditionTask)- consists of [Step3,Step11] 3. Runing Step3 is selected in Step4 and Step5 Step3 will be execute immediately after Step4 , its not correct. Step3 will be execute after Step5 again, its not correct either. 4. The correct scheduler is that step3 will be execute after step4 and step5. 5. So, I add a checking operate in the snippet as bellow: if (!driverContext.getRunnable().contains(tsk)) { console.printInfo(tsk.getId() + is selected by condition resolver.); if(DriverContext.isLaunchable(tsk)){ driverContext.addToRunnable(tsk); } } So , that is work right for me in my enviroment. I am not sure whether it will has some problems in someother condition. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6857) Refactor HiveServer2 TSetIpAddressProcessor
[ https://issues.apache.org/jira/browse/HIVE-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6857: --- Summary: Refactor HiveServer2 TSetIpAddressProcessor (was: Refactor HiveServer2 threadlocals) Refactor HiveServer2 TSetIpAddressProcessor --- Key: HIVE-6857 URL: https://issues.apache.org/jira/browse/HIVE-6857 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Excerpt HIVE-6837. Issues: 1. SessionManager#openSession: {code} public SessionHandle openSession(TProtocolVersion protocol, String username, String password, MapString, String sessionConf, boolean withImpersonation, String delegationToken) throws HiveSQLException { HiveSession session; if (withImpersonation) { HiveSessionImplwithUGI hiveSessionUgi = new HiveSessionImplwithUGI(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress(), delegationToken); session = HiveSessionProxy.getProxy(hiveSessionUgi, hiveSessionUgi.getSessionUgi()); hiveSessionUgi.setProxySession(session); } else { session = new HiveSessionImpl(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress()); } session.setSessionManager(this); session.setOperationManager(operationManager); session.open(); handleToSession.put(session.getSessionHandle(), session); try { executeSessionHooks(session); } catch (Exception e) { throw new HiveSQLException(Failed to execute session hooks, e); } return session.getSessionHandle(); } {code} Notice that if withImpersonation is set to true, we're using TSetIpAddressProcessor.getUserIpAddress() to get the IP address which is wrong for a kerberized setup (should use HiveAuthFactory#getIpAddress). 2. Also, in case of a kerberized setup, we're wrapping the transport in a doAs (with UGI of the HiveServer2 process) which doesn't make sense to me: https://github.com/apache/hive/blob/trunk/shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java#L335. 3. The name TSetIpAddressProcessor should be replaced with something more meaningful like TPlainSASLProcessor. 4. Consolidate thread locals used for username, ipaddress 5. Do not directly use TSetIpAddressProcessor; get it via factory like here: https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java#L161 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6857) Consolidate HiveServer2 threadlocals
[ https://issues.apache.org/jira/browse/HIVE-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6857: --- Description: Excerpt HIVE-6837. Issues: 1. SessionManager#openSession: {code} public SessionHandle openSession(TProtocolVersion protocol, String username, String password, MapString, String sessionConf, boolean withImpersonation, String delegationToken) throws HiveSQLException { HiveSession session; if (withImpersonation) { HiveSessionImplwithUGI hiveSessionUgi = new HiveSessionImplwithUGI(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress(), delegationToken); session = HiveSessionProxy.getProxy(hiveSessionUgi, hiveSessionUgi.getSessionUgi()); hiveSessionUgi.setProxySession(session); } else { session = new HiveSessionImpl(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress()); } session.setSessionManager(this); session.setOperationManager(operationManager); session.open(); handleToSession.put(session.getSessionHandle(), session); try { executeSessionHooks(session); } catch (Exception e) { throw new HiveSQLException(Failed to execute session hooks, e); } return session.getSessionHandle(); } {code} Notice that if withImpersonation is set to true, we're using TSetIpAddressProcessor.getUserIpAddress() to get the IP address which is wrong for a kerberized setup (should use HiveAuthFactory#getIpAddress). 2. Also, in case of a kerberized setup, we're wrapping the transport in a doAs (with UGI of the HiveServer2 process) which doesn't make sense to me: https://github.com/apache/hive/blob/trunk/shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java#L335. 3. The name TSetIpAddressProcessor should be replaced with something more meaningful like TPlainSASLProcessor. 4. Consolidate thread locals used for username, ipaddress 5. Do not directly use TSetIpAddressProcessor; get it via factory like here: https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java#L161 was:Check the discussion here: HIVE-6837 Consolidate HiveServer2 threadlocals Key: HIVE-6857 URL: https://issues.apache.org/jira/browse/HIVE-6857 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Excerpt HIVE-6837. Issues: 1. SessionManager#openSession: {code} public SessionHandle openSession(TProtocolVersion protocol, String username, String password, MapString, String sessionConf, boolean withImpersonation, String delegationToken) throws HiveSQLException { HiveSession session; if (withImpersonation) { HiveSessionImplwithUGI hiveSessionUgi = new HiveSessionImplwithUGI(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress(), delegationToken); session = HiveSessionProxy.getProxy(hiveSessionUgi, hiveSessionUgi.getSessionUgi()); hiveSessionUgi.setProxySession(session); } else { session = new HiveSessionImpl(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress()); } session.setSessionManager(this); session.setOperationManager(operationManager); session.open(); handleToSession.put(session.getSessionHandle(), session); try { executeSessionHooks(session); } catch (Exception e) { throw new HiveSQLException(Failed to execute session hooks, e); } return session.getSessionHandle(); } {code} Notice that if withImpersonation is set to true, we're using TSetIpAddressProcessor.getUserIpAddress() to get the IP address which is wrong for a kerberized setup (should use HiveAuthFactory#getIpAddress). 2. Also, in case of a kerberized setup, we're wrapping the transport in a doAs (with UGI of the HiveServer2 process) which doesn't make sense to me: https://github.com/apache/hive/blob/trunk/shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java#L335. 3. The name TSetIpAddressProcessor should be replaced with something more meaningful like TPlainSASLProcessor. 4. Consolidate thread locals used for username, ipaddress 5. Do not directly use TSetIpAddressProcessor; get it via factory like here: https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java#L161 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6857) Refactor HiveServer2 threadlocals
[ https://issues.apache.org/jira/browse/HIVE-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6857: --- Summary: Refactor HiveServer2 threadlocals (was: Consolidate HiveServer2 threadlocals) Refactor HiveServer2 threadlocals - Key: HIVE-6857 URL: https://issues.apache.org/jira/browse/HIVE-6857 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Excerpt HIVE-6837. Issues: 1. SessionManager#openSession: {code} public SessionHandle openSession(TProtocolVersion protocol, String username, String password, MapString, String sessionConf, boolean withImpersonation, String delegationToken) throws HiveSQLException { HiveSession session; if (withImpersonation) { HiveSessionImplwithUGI hiveSessionUgi = new HiveSessionImplwithUGI(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress(), delegationToken); session = HiveSessionProxy.getProxy(hiveSessionUgi, hiveSessionUgi.getSessionUgi()); hiveSessionUgi.setProxySession(session); } else { session = new HiveSessionImpl(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress()); } session.setSessionManager(this); session.setOperationManager(operationManager); session.open(); handleToSession.put(session.getSessionHandle(), session); try { executeSessionHooks(session); } catch (Exception e) { throw new HiveSQLException(Failed to execute session hooks, e); } return session.getSessionHandle(); } {code} Notice that if withImpersonation is set to true, we're using TSetIpAddressProcessor.getUserIpAddress() to get the IP address which is wrong for a kerberized setup (should use HiveAuthFactory#getIpAddress). 2. Also, in case of a kerberized setup, we're wrapping the transport in a doAs (with UGI of the HiveServer2 process) which doesn't make sense to me: https://github.com/apache/hive/blob/trunk/shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java#L335. 3. The name TSetIpAddressProcessor should be replaced with something more meaningful like TPlainSASLProcessor. 4. Consolidate thread locals used for username, ipaddress 5. Do not directly use TSetIpAddressProcessor; get it via factory like here: https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java#L161 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6857) Refactor HiveServer2 TSetIpAddressProcessor
[ https://issues.apache.org/jira/browse/HIVE-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962641#comment-13962641 ] Vaibhav Gumashta commented on HIVE-6857: [~thejas] I am just using this is a placeholder of issues I notice wrt TSetIpAddressProcessor, threadlocals. You might want to take a look. Some of these I'll resolve as part of HIVE-6864. Refactor HiveServer2 TSetIpAddressProcessor --- Key: HIVE-6857 URL: https://issues.apache.org/jira/browse/HIVE-6857 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Excerpt HIVE-6837. Issues: 1. SessionManager#openSession: {code} public SessionHandle openSession(TProtocolVersion protocol, String username, String password, MapString, String sessionConf, boolean withImpersonation, String delegationToken) throws HiveSQLException { HiveSession session; if (withImpersonation) { HiveSessionImplwithUGI hiveSessionUgi = new HiveSessionImplwithUGI(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress(), delegationToken); session = HiveSessionProxy.getProxy(hiveSessionUgi, hiveSessionUgi.getSessionUgi()); hiveSessionUgi.setProxySession(session); } else { session = new HiveSessionImpl(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress()); } session.setSessionManager(this); session.setOperationManager(operationManager); session.open(); handleToSession.put(session.getSessionHandle(), session); try { executeSessionHooks(session); } catch (Exception e) { throw new HiveSQLException(Failed to execute session hooks, e); } return session.getSessionHandle(); } {code} Notice that if withImpersonation is set to true, we're using TSetIpAddressProcessor.getUserIpAddress() to get the IP address which is wrong for a kerberized setup (should use HiveAuthFactory#getIpAddress). 2. Also, in case of a kerberized setup, we're wrapping the transport in a doAs (with UGI of the HiveServer2 process) which doesn't make sense to me: https://github.com/apache/hive/blob/trunk/shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java#L335. 3. The name TSetIpAddressProcessor should be replaced with something more meaningful like TPlainSASLProcessor. 4. Consolidate thread locals used for username, ipaddress 5. Do not directly use TSetIpAddressProcessor; get it via factory like here: https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java#L161 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6857) Refactor HiveServer2 TSetIpAddressProcessor
[ https://issues.apache.org/jira/browse/HIVE-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6857: --- Description: Excerpt from HIVE-6837 and related issues: 1. SessionManager#openSession: {code} public SessionHandle openSession(TProtocolVersion protocol, String username, String password, MapString, String sessionConf, boolean withImpersonation, String delegationToken) throws HiveSQLException { HiveSession session; if (withImpersonation) { HiveSessionImplwithUGI hiveSessionUgi = new HiveSessionImplwithUGI(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress(), delegationToken); session = HiveSessionProxy.getProxy(hiveSessionUgi, hiveSessionUgi.getSessionUgi()); hiveSessionUgi.setProxySession(session); } else { session = new HiveSessionImpl(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress()); } session.setSessionManager(this); session.setOperationManager(operationManager); session.open(); handleToSession.put(session.getSessionHandle(), session); try { executeSessionHooks(session); } catch (Exception e) { throw new HiveSQLException(Failed to execute session hooks, e); } return session.getSessionHandle(); } {code} Notice that if withImpersonation is set to true, we're using TSetIpAddressProcessor.getUserIpAddress() to get the IP address which is wrong for a kerberized setup (should use HiveAuthFactory#getIpAddress). 2. Also, in case of a kerberized setup, we're wrapping the transport in a doAs (with UGI of the HiveServer2 process) which doesn't make sense to me: https://github.com/apache/hive/blob/trunk/shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java#L335. 3. The name TSetIpAddressProcessor should be replaced with something more meaningful like TPlainSASLProcessor. 4. Consolidate thread locals used for username, ipaddress 5. Do not directly use TSetIpAddressProcessor; get it via factory like here: https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java#L161 was: Excerpt HIVE-6837. Issues: 1. SessionManager#openSession: {code} public SessionHandle openSession(TProtocolVersion protocol, String username, String password, MapString, String sessionConf, boolean withImpersonation, String delegationToken) throws HiveSQLException { HiveSession session; if (withImpersonation) { HiveSessionImplwithUGI hiveSessionUgi = new HiveSessionImplwithUGI(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress(), delegationToken); session = HiveSessionProxy.getProxy(hiveSessionUgi, hiveSessionUgi.getSessionUgi()); hiveSessionUgi.setProxySession(session); } else { session = new HiveSessionImpl(protocol, username, password, hiveConf, sessionConf, TSetIpAddressProcessor.getUserIpAddress()); } session.setSessionManager(this); session.setOperationManager(operationManager); session.open(); handleToSession.put(session.getSessionHandle(), session); try { executeSessionHooks(session); } catch (Exception e) { throw new HiveSQLException(Failed to execute session hooks, e); } return session.getSessionHandle(); } {code} Notice that if withImpersonation is set to true, we're using TSetIpAddressProcessor.getUserIpAddress() to get the IP address which is wrong for a kerberized setup (should use HiveAuthFactory#getIpAddress). 2. Also, in case of a kerberized setup, we're wrapping the transport in a doAs (with UGI of the HiveServer2 process) which doesn't make sense to me: https://github.com/apache/hive/blob/trunk/shims/common-secure/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java#L335. 3. The name TSetIpAddressProcessor should be replaced with something more meaningful like TPlainSASLProcessor. 4. Consolidate thread locals used for username, ipaddress 5. Do not directly use TSetIpAddressProcessor; get it via factory like here: https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java#L161 Refactor HiveServer2 TSetIpAddressProcessor --- Key: HIVE-6857 URL: https://issues.apache.org/jira/browse/HIVE-6857 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Excerpt from HIVE-6837 and related issues: 1. SessionManager#openSession: {code} public SessionHandle openSession(TProtocolVersion protocol, String username, String password, MapString, String sessionConf, boolean
[jira] [Updated] (HIVE-6782) HiveServer2Concurrency issue when running with tez intermittently, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error
[ https://issues.apache.org/jira/browse/HIVE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6782: - Attachment: HIVE-6782.11.patch Address Lefty's comment. HiveServer2Concurrency issue when running with tez intermittently, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error - Key: HIVE-6782 URL: https://issues.apache.org/jira/browse/HIVE-6782 Project: Hive Issue Type: Bug Components: Tez Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.13.0, 0.14.0 Attachments: HIVE-6782.1.patch, HIVE-6782.10.patch, HIVE-6782.11.patch, HIVE-6782.2.patch, HIVE-6782.3.patch, HIVE-6782.4.patch, HIVE-6782.5.patch, HIVE-6782.6.patch, HIVE-6782.7.patch, HIVE-6782.8.patch, HIVE-6782.9.patch HiveServer2 concurrency is failing intermittently when using tez, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6782) HiveServer2Concurrency issue when running with tez intermittently, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error
[ https://issues.apache.org/jira/browse/HIVE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962726#comment-13962726 ] Hive QA commented on HIVE-6782: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639082/HIVE-6782.10.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5549 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.security.TestMetastoreAuthorizationProvider.testSimplePrivileges {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2171/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2171/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639082 HiveServer2Concurrency issue when running with tez intermittently, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error - Key: HIVE-6782 URL: https://issues.apache.org/jira/browse/HIVE-6782 Project: Hive Issue Type: Bug Components: Tez Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.13.0, 0.14.0 Attachments: HIVE-6782.1.patch, HIVE-6782.10.patch, HIVE-6782.11.patch, HIVE-6782.2.patch, HIVE-6782.3.patch, HIVE-6782.4.patch, HIVE-6782.5.patch, HIVE-6782.6.patch, HIVE-6782.7.patch, HIVE-6782.8.patch, HIVE-6782.9.patch HiveServer2 concurrency is failing intermittently when using tez, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6866) Hive server2 jdbc driver connection leak with namenode
Shengjun Xin created HIVE-6866: -- Summary: Hive server2 jdbc driver connection leak with namenode Key: HIVE-6866 URL: https://issues.apache.org/jira/browse/HIVE-6866 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Shengjun Xin 1. Set 'ipc.client.connection.maxidletime' to 360 in core-site.xml and start hive-server2 2. Connect hive server2 continuously 3. It seems that hive server2 will not close the connection until the time out, the error message is as the following: {code} 2014-03-18 23:30:36,873 ERROR ql.Driver (SessionState.java:printError(386)) - FAILED: RuntimeException java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:190) at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:231) at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:288) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1274) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8676) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:95) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:181) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:40) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:37) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:37) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy11.mkdirs(Unknown Source) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:483) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2259)
[jira] [Updated] (HIVE-6866) Hive server2 jdbc driver connection leak with namenode
[ https://issues.apache.org/jira/browse/HIVE-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shengjun Xin updated HIVE-6866: --- Description: 1. Set 'ipc.client.connection.maxidletime' to 360 in core-site.xml and start hive-server2. 2. Connect hive server2 repetitively in a while true loop. 3. The tcp connection number will increase until out of memory, it seems that hive server2 will not close the connection until the time out, the error message is as the following: {code} 2014-03-18 23:30:36,873 ERROR ql.Driver (SessionState.java:printError(386)) - FAILED: RuntimeException java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:190) at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:231) at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:288) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1274) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8676) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:95) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:181) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:40) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:37) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:37) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy11.mkdirs(Unknown Source) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:483) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2259) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2230)
[jira] [Updated] (HIVE-6866) Hive server2 jdbc driver connection leak with namenode
[ https://issues.apache.org/jira/browse/HIVE-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shengjun Xin updated HIVE-6866: --- Description: 1. Set 'ipc.client.connection.maxidletime' to 360 in core-site.xml and start hive-server2. 2. Connect hive server2 repetitively in a while true loop. 3. It seems that hive server2 will not close the connection until the time out, the error message is as the following: {code} 2014-03-18 23:30:36,873 ERROR ql.Driver (SessionState.java:printError(386)) - FAILED: RuntimeException java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:190) at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:231) at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:288) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1274) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8676) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:95) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:181) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:40) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:37) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:37) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy11.mkdirs(Unknown Source) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:483) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2259) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2230) at
[jira] [Commented] (HIVE-6858) Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7.
[ https://issues.apache.org/jira/browse/HIVE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962753#comment-13962753 ] Jason Dere commented on HIVE-6858: -- Would you be able to fix groupby3_map_skew.q as well, which looks like it also has a similar issue? For that one maybe you could replace: SELECT dest1.* FROM dest1; with: SELECT c1, c2, c3, c4, c5, c6, c7, ROUND(c8, 5), ROUND(c9, 5) FROM dest1; And hopefully the values generated do not show differences between the jdk6/7 formatting. Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. --- Key: HIVE-6858 URL: https://issues.apache.org/jira/browse/HIVE-6858 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6858.1.patch Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. {noformat} -250.0 6583411.236 1.0 6583411.236 -0.004 -0.0048 --- -250.0 6583411.236 1.0 6583411.236 -0.0040 -0.0048 {noformat} Following code reproduces this behavior when run in jdk-7 vs jdk-6. Jdk-7 produces -0.004 while, jdk-6 produces -0.0040. {code} public class Main { public static void main(String[] a) throws Exception { double val = 0.004; System.out.println(Value = +val); } } {code} This happens to be a bug in jdk6, that has been fixed in jdk7. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962755#comment-13962755 ] Lars Francke commented on HIVE-5687: Thanks, could you put a new version up on RB? Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6782) HiveServer2Concurrency issue when running with tez intermittently, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error
[ https://issues.apache.org/jira/browse/HIVE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962778#comment-13962778 ] Hive QA commented on HIVE-6782: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639155/HIVE-6782.11.patch {color:green}SUCCESS:{color} +1 5549 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2172/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2172/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12639155 HiveServer2Concurrency issue when running with tez intermittently, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error - Key: HIVE-6782 URL: https://issues.apache.org/jira/browse/HIVE-6782 Project: Hive Issue Type: Bug Components: Tez Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.13.0, 0.14.0 Attachments: HIVE-6782.1.patch, HIVE-6782.10.patch, HIVE-6782.11.patch, HIVE-6782.2.patch, HIVE-6782.3.patch, HIVE-6782.4.patch, HIVE-6782.5.patch, HIVE-6782.6.patch, HIVE-6782.7.patch, HIVE-6782.8.patch, HIVE-6782.9.patch HiveServer2 concurrency is failing intermittently when using tez, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6825) custom jars for Hive query should be uploaded to scratch dir per query; and/or versioned
[ https://issues.apache.org/jira/browse/HIVE-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962947#comment-13962947 ] Hive QA commented on HIVE-6825: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639084/HIVE-6825.01.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5549 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2173/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2173/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639084 custom jars for Hive query should be uploaded to scratch dir per query; and/or versioned Key: HIVE-6825 URL: https://issues.apache.org/jira/browse/HIVE-6825 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-6825.01.patch, HIVE-6825.patch Currently the jars are uploaded to either user directory or global, whatever is configured, which is a mess and can cause collisions. We can upload to scratch directory, and/or version. There's a tradeoff between having to upload files every time (for example, for commonly used things like HBase input format) (which is what is done now, into global/user path), and having a mess of one-off custom jars and files, versioned, sitting in .hiveJars. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6860) Issue with FS based stats collection on Tez
[ https://issues.apache.org/jira/browse/HIVE-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6860: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk 0.13 Issue with FS based stats collection on Tez --- Key: HIVE-6860 URL: https://issues.apache.org/jira/browse/HIVE-6860 Project: Hive Issue Type: Bug Components: Statistics, Tez Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.13.0 Attachments: HIVE-6860.patch Statistics from different tasks got overwritten while running on Tez. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6855) A couple of errors in MySQL db creation script for transaction tables
[ https://issues.apache.org/jira/browse/HIVE-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6855: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to 0.13 trunk. A couple of errors in MySQL db creation script for transaction tables - Key: HIVE-6855 URL: https://issues.apache.org/jira/browse/HIVE-6855 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-6855.patch There are a few small issues in the database creation scripts for mysql. A couple of the tables don't set the engine to InnoDB. None of the tables set default character set to latin1. And the syntax CREATE INDEX...USING HASH doesn't work on older versions of MySQL. Instead the index creation should be done without specifying a method (no USING clause). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6113) Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HIVE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963055#comment-13963055 ] Eli Acherkan commented on HIVE-6113: The exact same issue reproduces here. Hive 0.12 on MapR 3.1.0 with MySQL metastore. The exception appears when there are several processes working with Hive concurrently. From our analysis the problem seems related to the one described here: http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3c4f6b25afffcafe44b6259a412d5f9b1033183...@exchmbx104.netflix.com%3E h5. Analysis: At certain times, Hive's DataNucleus decides to create and then drop tables called DELETEME+timestamp in the metastore schema on MySQL (see [ProbleTable|http://sourceforge.net/p/datanucleus/code/HEAD/tree/platform/store.rdbms/tags/datanucleus-rdbms-3.2.2/src/java/org/datanucleus/store/rdbms/table/ProbeTable.java]). During other flows, DataNucleus queries MySQL for the list of all the columns of all the tables (see [RDBMSSchemaHandler.refreshTableData|http://sourceforge.net/p/datanucleus/code/HEAD/tree/platform/store.rdbms/tags/datanucleus-rdbms-3.2.2/src/java/org/datanucleus/store/rdbms/schema/RDBMSSchemaHandler.java#l872]). MySQL's JDBC driver implements the DatabaseMetaData.getColumns method by querying the DB for a list of all the tables, and then iterating over that list and querying for each table's columns (see [com.mysql.jdbc.DatabaseMetaData|http://bazaar.launchpad.net/~mysql/connectorj/5.1/view/head:/src/com/mysql/jdbc/DatabaseMetaData.java#L2581]). If a table is deleted from the DB during this operation, DatabaseMetaData.getColumns will throw an exception. This exception is interpreted by Hive to mean that the default Hive database doesn't exist. Hive tries to create it, inserting a row into the metastore.DBS table in MySQL, which triggers the Duplicate entry 'default' for key 'UNIQUE_DATABASE' exception. I'm not completely clear about the conditions for a) DataNucleus creating and dropping a DELETEME table, and b) DataNucleus calling DatabaseMetaData.getColumns, so unfortunately I can't yet provide a clear test case. But in our lab environment under load we were able to reproduce the exception once every few minutes. h5. Workaround: As suggested by the link above, setting the *datanucleus.fixedDatastore* property to *true* (e.g. in hive-site.xml or elsewhere) seems to solve the problem. However, it means that the metastore schema is no longer automatically created on-demand, and requires using Hive's schematool command to manually create the metastore schema. Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient -- Key: HIVE-6113 URL: https://issues.apache.org/jira/browse/HIVE-6113 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.12.0 Environment: hadoop-0.20.2-cdh3u3,hive-0.12.0 Reporter: William Stone Priority: Critical Labels: HiveMetaStoreClient, metastore, unable_instantiate When I exccute SQL use fdm; desc formatted fdm.tableName; in python, throw Error as followed. but when I tryit again , It will success. 2013-12-25 03:01:32,290 ERROR exec.DDLTask (DDLTask.java:execute(435)) - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1143) at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128) at org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:260) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:507) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:875) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:769) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:708) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Updated] (HIVE-6830) After major compaction unable to read from partition with MR job
[ https://issues.apache.org/jira/browse/HIVE-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-6830: Resolution: Fixed Status: Resolved (was: Patch Available) The test case passed locally. I just committed this. Thanks for the review Harish. After major compaction unable to read from partition with MR job Key: HIVE-6830 URL: https://issues.apache.org/jira/browse/HIVE-6830 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Owen O'Malley Priority: Critical Fix For: 0.13.0 Attachments: HIVE-6830.patch After doing a major compaction any attempt to do read the data with an MR job (select count(*), subsequent compaction) fails with: Caused by: java.lang.IllegalArgumentException: All base directories were ignored, such as hdfs://hdp.example.com:8020/apps/hive/warehouse/purchaselog/ds=201404031016/base_0044000 by 5:4086:... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6830) After major compaction unable to read from partition with MR job
[ https://issues.apache.org/jira/browse/HIVE-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963062#comment-13963062 ] Owen O'Malley commented on HIVE-6830: - Sergey, if bestBase is defined it adds which ever is older (either bestBase or child) to obsolete. After major compaction unable to read from partition with MR job Key: HIVE-6830 URL: https://issues.apache.org/jira/browse/HIVE-6830 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Owen O'Malley Priority: Critical Fix For: 0.13.0 Attachments: HIVE-6830.patch After doing a major compaction any attempt to do read the data with an MR job (select count(*), subsequent compaction) fails with: Caused by: java.lang.IllegalArgumentException: All base directories were ignored, such as hdfs://hdp.example.com:8020/apps/hive/warehouse/purchaselog/ds=201404031016/base_0044000 by 5:4086:... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6787) ORC+ACID assumes all missing buckets are in ACID structure
[ https://issues.apache.org/jira/browse/HIVE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-6787: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review, Sergey! I just committed this. ORC+ACID assumes all missing buckets are in ACID structure -- Key: HIVE-6787 URL: https://issues.apache.org/jira/browse/HIVE-6787 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.0 Reporter: Gopal V Assignee: Owen O'Malley Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6787.patch ORC+ACID creates ACID structure splits for all missing buckets in a table {code} java.lang.RuntimeException: java.io.IOException: java.io.IOException: Vectorization and ACID tables are incompatible. at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:996) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:240) ... 15 more {code} The tables are normal ORC tables and are not using ACID structure at all. {code} @@ -539,7 +539,7 @@ public void run() { for(int b=0; b context.numBuckets; ++b) { if (!covered[b]) { context.splits.add(new OrcSplit(dir, b, 0, new String[0], null, - false, false, deltas)); + isOriginal, false, deltas)); } } {code} seems to fix the issue. [~owen.omalley], please confirm if that is what I should be doing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6846) allow safe set commands with sql standard authorization
[ https://issues.apache.org/jira/browse/HIVE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963069#comment-13963069 ] Hive QA commented on HIVE-6846: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639099/HIVE-6846.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5553 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2174/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2174/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639099 allow safe set commands with sql standard authorization --- Key: HIVE-6846 URL: https://issues.apache.org/jira/browse/HIVE-6846 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-6846.1.patch, HIVE-6846.2.patch, HIVE-6846.3.patch HIVE-6827 disables all set commands when SQL standard authorization is turned on, but not all set commands are unsafe. We should allow safe set commands. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package
[ https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-6757: Assignee: Harish Butani (was: Owen O'Malley) Remove deprecated parquet classes from outside of org.apache package Key: HIVE-6757 URL: https://issues.apache.org/jira/browse/HIVE-6757 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Harish Butani Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6757.2.patch, HIVE-6757.patch, parquet-hive.patch Apache shouldn't release projects with files outside of the org.apache namespace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6757) Remove deprecated parquet classes from outside of org.apache package
[ https://issues.apache.org/jira/browse/HIVE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6757: Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and 0.13 thanks Owen, Xuefu, Brock, Justin. Remove deprecated parquet classes from outside of org.apache package Key: HIVE-6757 URL: https://issues.apache.org/jira/browse/HIVE-6757 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Harish Butani Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6757.2.patch, HIVE-6757.patch, parquet-hive.patch Apache shouldn't release projects with files outside of the org.apache namespace. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4904) A little more CP crossing RS boundaries
[ https://issues.apache.org/jira/browse/HIVE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963115#comment-13963115 ] Ashutosh Chauhan commented on HIVE-4904: +1 A little more CP crossing RS boundaries --- Key: HIVE-4904 URL: https://issues.apache.org/jira/browse/HIVE-4904 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4904.3.patch, HIVE-4904.4.patch, HIVE-4904.5.patch, HIVE-4904.D11757.1.patch, HIVE-4904.D11757.2.patch Currently, CP context cannot be propagated over RS except for JOIN/EXT. A little more CP is possible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6818) Array out of bounds when ORC is used with ACID and predicate push down
[ https://issues.apache.org/jira/browse/HIVE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963110#comment-13963110 ] Owen O'Malley commented on HIVE-6818: - The three failures are unrelated and pass when I run it locally. I'll commit this after the 24 hours. Array out of bounds when ORC is used with ACID and predicate push down -- Key: HIVE-6818 URL: https://issues.apache.org/jira/browse/HIVE-6818 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6818.patch The users gets an ArrayOutOfBoundsException when using ORC, ACID, and predicate push down. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6858) Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7.
[ https://issues.apache.org/jira/browse/HIVE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963187#comment-13963187 ] Hive QA commented on HIVE-6858: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639098/HIVE-6858.1.patch {color:green}SUCCESS:{color} +1 5550 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2175/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2175/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12639098 Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. --- Key: HIVE-6858 URL: https://issues.apache.org/jira/browse/HIVE-6858 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6858.1.patch Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. {noformat} -250.0 6583411.236 1.0 6583411.236 -0.004 -0.0048 --- -250.0 6583411.236 1.0 6583411.236 -0.0040 -0.0048 {noformat} Following code reproduces this behavior when run in jdk-7 vs jdk-6. Jdk-7 produces -0.004 while, jdk-6 produces -0.0040. {code} public class Main { public static void main(String[] a) throws Exception { double val = 0.004; System.out.println(Value = +val); } } {code} This happens to be a bug in jdk6, that has been fixed in jdk7. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963192#comment-13963192 ] Sergey Shelukhin commented on HIVE-6430: Tested the patch on real queries. I do see huge memory reduction (modified TPCDS query 72, worst map task goes from 7Gb to ~1.2Gb dump after populating hash tables, I'll need to download the dumps to analyze but it's pretty clear cut); and GC time counter goes down from ~1min total to few seconds, as expected, but I also see huge wall clock time increase (without corresponding CPU time increase it looks like) during processing. I would expect some tradeoff but not as much as I'm seeing... will profile more. MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6809) Support bulk deleting directories for partition drop with partial spec
[ https://issues.apache.org/jira/browse/HIVE-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963194#comment-13963194 ] Sergey Shelukhin commented on HIVE-6809: Can you update RB also? Support bulk deleting directories for partition drop with partial spec -- Key: HIVE-6809 URL: https://issues.apache.org/jira/browse/HIVE-6809 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Attachments: HIVE-6809.1.patch.txt, HIVE-6809.2.patch.txt, HIVE-6809.3.patch.txt, HIVE-6809.4.patch.txt In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6825) custom jars for Hive query should be uploaded to scratch dir per query; and/or versioned
[ https://issues.apache.org/jira/browse/HIVE-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963205#comment-13963205 ] Sergey Shelukhin commented on HIVE-6825: The test failure looks unrelated and the test passes for me locally. Will commit today late afternoon (after 24h) custom jars for Hive query should be uploaded to scratch dir per query; and/or versioned Key: HIVE-6825 URL: https://issues.apache.org/jira/browse/HIVE-6825 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-6825.01.patch, HIVE-6825.patch Currently the jars are uploaded to either user directory or global, whatever is configured, which is a mess and can cause collisions. We can upload to scratch directory, and/or version. There's a tradeoff between having to upload files every time (for example, for commonly used things like HBase input format) (which is what is done now, into global/user path), and having a mess of one-off custom jars and files, versioned, sitting in .hiveJars. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6846) allow safe set commands with sql standard authorization
[ https://issues.apache.org/jira/browse/HIVE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6846: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.13 trunk. allow safe set commands with sql standard authorization --- Key: HIVE-6846 URL: https://issues.apache.org/jira/browse/HIVE-6846 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-6846.1.patch, HIVE-6846.2.patch, HIVE-6846.3.patch HIVE-6827 disables all set commands when SQL standard authorization is turned on, but not all set commands are unsafe. We should allow safe set commands. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6604) Fix vectorized input to work with ACID
[ https://issues.apache.org/jira/browse/HIVE-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-6604: Attachment: HIVE-6604.patch This patch: * Adds support for Decimal to the VectorizedBatchUtil.addRowToBatch. * Make addRowToBatch copy the bytes for Strings to avoid them being overwritten by the next value. * Add a unit test case with ACID, vectorization, and all of the handled types. * Fixes some of the method names to use proper capitalization. * Removes the unused parameter to setNullColIsNullValue. * Adds important tracking of the number of insert, update, and deletes. * Fix WriterImpl.writeIntermediateFooter to notify the callback. Fix vectorized input to work with ACID -- Key: HIVE-6604 URL: https://issues.apache.org/jira/browse/HIVE-6604 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6604.patch, HIVE-6604.patch Fix the VectorizedOrcInputFormat to work with the ACID directories. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6604) Fix vectorized input to work with ACID
[ https://issues.apache.org/jira/browse/HIVE-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963277#comment-13963277 ] Owen O'Malley commented on HIVE-6604: - Jitendra, in regards to your other comments: * For each row returned by next, it is added to the batch. * Until we add ACID update and delete into Hive's SQL, we can't make a qfile test for this. Fix vectorized input to work with ACID -- Key: HIVE-6604 URL: https://issues.apache.org/jira/browse/HIVE-6604 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6604.patch, HIVE-6604.patch Fix the VectorizedOrcInputFormat to work with the ACID directories. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 19754: Defines a api for streaming data into Hive using ACID support.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19754/ --- (Updated April 8, 2014, 6:27 p.m.) Review request for hive. Changes --- addressing review comments. - move to hcatalog - expose HiveConf to client API Bugs: HIVE-5687 https://issues.apache.org/jira/browse/HIVE-5687 Repository: hive-git Description --- Defines an API for streaming data into Hive using ACID support. Diffs (updated) - hcatalog/pom.xml 50ce296 hcatalog/streaming/pom.xml PRE-CREATION hcatalog/streaming/src/docs/package.html PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/ConnectionError.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HeartBeatFailure.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/ImpersonationFailed.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/InvalidColumn.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/InvalidPartition.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/InvalidTable.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/InvalidTrasactionState.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/PartitionCreationFailed.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/QueryFailedException.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/RecordWriter.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/SerializationError.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StreamingConnection.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StreamingException.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StreamingIOFailure.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/TransactionBatch.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/TransactionBatchUnAvailable.java PRE-CREATION hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/TransactionError.java PRE-CREATION hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/StreamingIntegrationTester.java PRE-CREATION hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestDelimitedInputWriter.java PRE-CREATION hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java PRE-CREATION hcatalog/streaming/src/test/sit PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 1bbe02e packaging/pom.xml de9b002 packaging/src/main/assembly/src.xml bdaa47b Diff: https://reviews.apache.org/r/19754/diff/ Testing --- Unit tests included. Also done manual testing by streaming data using flume. Thanks, Roshan Naik
[jira] [Updated] (HIVE-6782) HiveServer2Concurrency issue when running with tez intermittently, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error
[ https://issues.apache.org/jira/browse/HIVE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6782: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to both trunk and branch-0.13 HiveServer2Concurrency issue when running with tez intermittently, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error - Key: HIVE-6782 URL: https://issues.apache.org/jira/browse/HIVE-6782 Project: Hive Issue Type: Bug Components: Tez Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.13.0, 0.14.0 Attachments: HIVE-6782.1.patch, HIVE-6782.10.patch, HIVE-6782.11.patch, HIVE-6782.2.patch, HIVE-6782.3.patch, HIVE-6782.4.patch, HIVE-6782.5.patch, HIVE-6782.6.patch, HIVE-6782.7.patch, HIVE-6782.8.patch, HIVE-6782.9.patch HiveServer2 concurrency is failing intermittently when using tez, throwing org.apache.tez.dag.api.SessionNotRunning: Application not running error -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6759) Fix reading partial ORC files while they are being written
[ https://issues.apache.org/jira/browse/HIVE-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-6759: Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review, Sergey and Harish. Thejas ran the unit tests internally and they passed. I just committed this. Fix reading partial ORC files while they are being written -- Key: HIVE-6759 URL: https://issues.apache.org/jira/browse/HIVE-6759 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Critical Fix For: 0.13.0 Attachments: HIVE-6759.patch HDFS with the hflush ensures the bytes are visible, but doesn't update the file length on the NameNode. Currently the Orc reader will only read up to the length on the NameNode. If the user specified a length from a flush_length file, the Orc reader should trust it to be right. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6822) TestAvroSerdeUtils fails with -Phadoop-2
[ https://issues.apache.org/jira/browse/HIVE-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963294#comment-13963294 ] Ashutosh Chauhan commented on HIVE-6822: +1 TestAvroSerdeUtils fails with -Phadoop-2 Key: HIVE-6822 URL: https://issues.apache.org/jira/browse/HIVE-6822 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6822.1.patch Works fine with -Phadoop-1, but with -Phadoop-2 hits the following error: {noformat} Running org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.603 sec FAILURE! - in org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils determineSchemaCanReadSchemaFromHDFS(org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils) Time elapsed: 0.688 sec ERROR! java.lang.NoClassDefFoundError: com/sun/jersey/spi/container/servlet/ServletContainer at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hadoop.http.HttpServer2.addJerseyResourcePackage(HttpServer2.java:564) at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.initWebHdfs(NameNodeHttpServer.java:84) at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:121) at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:601) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:500) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:658) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:643) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1259) at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:914) at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:805) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:663) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:603) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:474) at org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS(TestAvroSerdeUtils.java:189) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6845) TestJdbcDriver.testShowRoleGrant can fail if TestJdbcDriver/TestJdbcDriver2 run together
[ https://issues.apache.org/jira/browse/HIVE-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6845: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to 0.13 trunk. TestJdbcDriver.testShowRoleGrant can fail if TestJdbcDriver/TestJdbcDriver2 run together Key: HIVE-6845 URL: https://issues.apache.org/jira/browse/HIVE-6845 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.13.0 Attachments: HIVE-6845.1.patch Running both TestJdbcDriver/TestJdbcDriver2 together in the same run gives an error in testShowRoleGrant() because both tests create the role role1. When the 2nd test tries to create the role it fails: {noformat} testShowRoleGrant(org.apache.hive.jdbc.TestJdbcDriver2) Time elapsed: 1.801 sec ERROR! java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275) at org.apache.hive.jdbc.TestJdbcDriver2.testShowRoleGrant(TestJdbcDriver2.java:2000) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6853) show create table for hbase tables should exclude LOCATION
[ https://issues.apache.org/jira/browse/HIVE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963316#comment-13963316 ] Szehon Ho commented on HIVE-6853: - Thanks for the fix, only one minor comment, is it needed to make a StringBuilder when there is only one string to return? Also can you upload the patch in right name-format? The precommit test takes patches in the form HIVE-.patch or HIVE-.n.patch only. show create table for hbase tables should exclude LOCATION --- Key: HIVE-6853 URL: https://issues.apache.org/jira/browse/HIVE-6853 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.10.0 Reporter: Miklos Christine Attachments: HIVE-6853-0.patch If you create a table on top of hbase in hive and issue a show create table hbase_table, it gives a bad DDL. It should not show LOCATION: [hive]$ cat /tmp/test_create.sql CREATE EXTERNAL TABLE nba_twitter.hbase2( key string COMMENT 'from deserializer', name string COMMENT 'from deserializer', pdt string COMMENT 'from deserializer', service string COMMENT 'from deserializer', term string COMMENT 'from deserializer', update1 string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format'='1', 'hbase.columns.mapping'=':key,srv:name,srv:pdt,srv:service,srv:term,srv:update') LOCATION 'hdfs://nameservice1/user/hive/warehouse/nba_twitter.db/hbase' TBLPROPERTIES ( 'hbase.table.name'='NBATwitter', 'transient_lastDdlTime'='1386172188') Trying to create a table using the above fails: [hive]$ hive -f /tmp/test_create.sql cli -f /tmp/test_create.sql Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties FAILED: Error in metadata: MetaException(message:LOCATION may not be specified for HBase.) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask However, if I remove the LOCATION, then the DDL is valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6812) show compactions returns error when there are no compactions
[ https://issues.apache.org/jira/browse/HIVE-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6812: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to 0.13 trunk. show compactions returns error when there are no compactions Key: HIVE-6812 URL: https://issues.apache.org/jira/browse/HIVE-6812 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.13.0 Attachments: HIVE-6812.patch Doing show compactions when there are no current transactions in process or in the queue results in: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6809) Support bulk deleting directories for partition drop with partial spec
[ https://issues.apache.org/jira/browse/HIVE-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1396#comment-1396 ] Hive QA commented on HIVE-6809: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639100/HIVE-6809.4.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5551 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2176/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2176/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639100 Support bulk deleting directories for partition drop with partial spec -- Key: HIVE-6809 URL: https://issues.apache.org/jira/browse/HIVE-6809 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Attachments: HIVE-6809.1.patch.txt, HIVE-6809.2.patch.txt, HIVE-6809.3.patch.txt, HIVE-6809.4.patch.txt In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6604) Fix vectorized input to work with ACID
[ https://issues.apache.org/jira/browse/HIVE-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963335#comment-13963335 ] Jitendra Nath Pandey commented on HIVE-6604: +1 Fix vectorized input to work with ACID -- Key: HIVE-6604 URL: https://issues.apache.org/jira/browse/HIVE-6604 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.13.0 Attachments: HIVE-6604.patch, HIVE-6604.patch Fix the VectorizedOrcInputFormat to work with the ACID directories. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6773) Update readme for ptest2 framework
[ https://issues.apache.org/jira/browse/HIVE-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6773: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Update readme for ptest2 framework -- Key: HIVE-6773 URL: https://issues.apache.org/jira/browse/HIVE-6773 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Priority: Minor Fix For: 0.14.0 Attachments: HIVE-6773.patch Approvals dependency is needed for testing. Need to add instructions. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6844) support separate configuration param for enabling authorization using new interface
[ https://issues.apache.org/jira/browse/HIVE-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963364#comment-13963364 ] Thejas M Nair commented on HIVE-6844: - I think the configuration doc is fine. It was just me not RTFM ! :) support separate configuration param for enabling authorization using new interface --- Key: HIVE-6844 URL: https://issues.apache.org/jira/browse/HIVE-6844 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair The existing configuration parameter *hive.security.authorization.enabled* is used for both SQL query level authorization at sql query compilation, and at metatore api authorization for the thrift metastore api calls. This makes it hard to flexibly/correctly configure the security settings. It should be possible to enable SQL query level authorization and metastore api authorization independently of each other. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6863) HiveServer2 binary mode throws exception with PAM
[ https://issues.apache.org/jira/browse/HIVE-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963387#comment-13963387 ] Thejas M Nair commented on HIVE-6863: - +1 HiveServer2 binary mode throws exception with PAM - Key: HIVE-6863 URL: https://issues.apache.org/jira/browse/HIVE-6863 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-6863.1.patch Works fine in http mode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963416#comment-13963416 ] Ashu Pachauri commented on HIVE-4629: - Any estimate on when this will be accepted into trunk? HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5888) group by after join operation product no result when hive.optimize.skewjoin = true
[ https://issues.apache.org/jira/browse/HIVE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963454#comment-13963454 ] Muthu commented on HIVE-5888: - [~navis] After applying the patch from HIVE-6041 to hive 0.12, queries with auto MAPJOIN fails with the following error: Any workarounds? set hive.optimize.skewjoin=true; set hive.auto.convert.join=true; SELECT ru.userid, SUM(ru.total_count) FROM BIGTABLE ru JOIN SMALLTABLE c on c.creative_id = ru.creative_id JOIN placement_dapi p ON p.placement_id = c.placement_id WHERE ru.realdate = '2014-01-02' AND ru.userid 0 GROUP BY ru.userid; Stage-1 is selected by condition resolver. java.io.FileNotFoundException: java.io.FileNotFoundException: File does not exist: /tmp/hive-muthu.nivas/tmp/hive-muthu.nivas/hive_2014-02-26_18-17-04_075_3879899075227148508-1/-mr-10002 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) at org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:917) at org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:232) at org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask(ConditionalResolverCommonJoin.java:185) at org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.getTasks(ConditionalResolverCommonJoin.java:117) at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:55) group by after join operation product no result when hive.optimize.skewjoin = true Key: HIVE-5888 URL: https://issues.apache.org/jira/browse/HIVE-5888 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.12.0 Reporter: cyril liao Priority: Critical -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6861) more hadoop2 only golden files to fix
[ https://issues.apache.org/jira/browse/HIVE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963462#comment-13963462 ] Hive QA commented on HIVE-6861: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639124/HIVE-6861.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2179/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2179/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639124 more hadoop2 only golden files to fix - Key: HIVE-6861 URL: https://issues.apache.org/jira/browse/HIVE-6861 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6861.1.patch More hadoop2 golden files to fix due to HIVE-6643, HIVE-6642, HIVE-6808, HIVE-6144. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6858) Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7.
[ https://issues.apache.org/jira/browse/HIVE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-6858: --- Attachment: HIVE-6858.2.patch Updated patch with the fix to groupby3_map_skew.q as well, as suggested by [~jdere]. Verfied that it passes in both jdk6, jdk7. Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. --- Key: HIVE-6858 URL: https://issues.apache.org/jira/browse/HIVE-6858 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6858.1.patch, HIVE-6858.2.patch Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. {noformat} -250.0 6583411.236 1.0 6583411.236 -0.004 -0.0048 --- -250.0 6583411.236 1.0 6583411.236 -0.0040 -0.0048 {noformat} Following code reproduces this behavior when run in jdk-7 vs jdk-6. Jdk-7 produces -0.004 while, jdk-6 produces -0.0040. {code} public class Main { public static void main(String[] a) throws Exception { double val = 0.004; System.out.println(Value = +val); } } {code} This happens to be a bug in jdk6, that has been fixed in jdk7. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6867) Bucketized Table feature fails in some cases
Laljo John Pullokkaran created HIVE-6867: Summary: Bucketized Table feature fails in some cases Key: HIVE-6867 URL: https://issues.apache.org/jira/browse/HIVE-6867 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jpullokkaran/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases
[ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963472#comment-13963472 ] Laljo John Pullokkaran commented on HIVE-6867: -- BucketingSortingReduceSinkOptimizer removes RS op if src destination is bucketed on same key. Bucketized Table feature fails in some cases Key: HIVE-6867 URL: https://issues.apache.org/jira/browse/HIVE-6867 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jpullokkaran/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6858) Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7.
[ https://issues.apache.org/jira/browse/HIVE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-6858: --- Status: Open (was: Patch Available) Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. --- Key: HIVE-6858 URL: https://issues.apache.org/jira/browse/HIVE-6858 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6858.1.patch, HIVE-6858.2.patch Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. {noformat} -250.0 6583411.236 1.0 6583411.236 -0.004 -0.0048 --- -250.0 6583411.236 1.0 6583411.236 -0.0040 -0.0048 {noformat} Following code reproduces this behavior when run in jdk-7 vs jdk-6. Jdk-7 produces -0.004 while, jdk-6 produces -0.0040. {code} public class Main { public static void main(String[] a) throws Exception { double val = 0.004; System.out.println(Value = +val); } } {code} This happens to be a bug in jdk6, that has been fixed in jdk7. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6858) Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7.
[ https://issues.apache.org/jira/browse/HIVE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-6858: --- Status: Patch Available (was: Open) Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. --- Key: HIVE-6858 URL: https://issues.apache.org/jira/browse/HIVE-6858 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6858.1.patch, HIVE-6858.2.patch Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. {noformat} -250.0 6583411.236 1.0 6583411.236 -0.004 -0.0048 --- -250.0 6583411.236 1.0 6583411.236 -0.0040 -0.0048 {noformat} Following code reproduces this behavior when run in jdk-7 vs jdk-6. Jdk-7 produces -0.004 while, jdk-6 produces -0.0040. {code} public class Main { public static void main(String[] a) throws Exception { double val = 0.004; System.out.println(Value = +val); } } {code} This happens to be a bug in jdk6, that has been fixed in jdk7. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6858) Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7.
[ https://issues.apache.org/jira/browse/HIVE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963470#comment-13963470 ] Jason Dere commented on HIVE-6858: -- Thanks for tracking this one down. +1 Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. --- Key: HIVE-6858 URL: https://issues.apache.org/jira/browse/HIVE-6858 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6858.1.patch, HIVE-6858.2.patch Unit tests decimal_udf.q, vectorization_div0.q fail with jdk-7. {noformat} -250.0 6583411.236 1.0 6583411.236 -0.004 -0.0048 --- -250.0 6583411.236 1.0 6583411.236 -0.0040 -0.0048 {noformat} Following code reproduces this behavior when run in jdk-7 vs jdk-6. Jdk-7 produces -0.004 while, jdk-6 produces -0.0040. {code} public class Main { public static void main(String[] a) throws Exception { double val = 0.004; System.out.println(Value = +val); } } {code} This happens to be a bug in jdk6, that has been fixed in jdk7. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6862) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server
[ https://issues.apache.org/jira/browse/HIVE-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6862: - Summary: add DB schema DDL and upgrade 12to13 scripts for MS SQL Server (was: add DB schema DDL statements for MS SQL Server) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server -- Key: HIVE-6862 URL: https://issues.apache.org/jira/browse/HIVE-6862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman need to add a unifed 0.13 script and a separate script for ACID support -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases
[ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-6867: - Description: Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. was: Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jpullokkaran/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. Bucketized Table feature fails in some cases Key: HIVE-6867 URL: https://issues.apache.org/jira/browse/HIVE-6867 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-5687: Attachment: package.html Remove the hcatalog/streaming/src/docs/package.html and put this file into hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/package.html. I've removed all of the complicated formatting and non-standard characters that Microsoft Word added. It is important for open source projects to have documentation that can be edited. It is also better to include this documentation as part of the javadoc and have links to the API's javadoc rather than reproduce it. Other than replacing the documentation, it is ok. +1 Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6861) more hadoop2 only golden files to fix
[ https://issues.apache.org/jira/browse/HIVE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963515#comment-13963515 ] Ashutosh Chauhan commented on HIVE-6861: +1 more hadoop2 only golden files to fix - Key: HIVE-6861 URL: https://issues.apache.org/jira/browse/HIVE-6861 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6861.1.patch More hadoop2 golden files to fix due to HIVE-6643, HIVE-6642, HIVE-6808, HIVE-6144. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6868) Create table in HCatalog sets different SerDe defaults than what is set through the CLI
[ https://issues.apache.org/jira/browse/HIVE-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6868: Attachment: HIVE-6868.1.patch Create table in HCatalog sets different SerDe defaults than what is set through the CLI --- Key: HIVE-6868 URL: https://issues.apache.org/jira/browse/HIVE-6868 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Harish Butani Attachments: HIVE-6868.1.patch HCatCreateTableDesc doesn't invoke the getEmptyTable function on org.apache.hadoop.hive.ql.metadata.Table -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6863) HiveServer2 binary mode throws exception with PAM
[ https://issues.apache.org/jira/browse/HIVE-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963545#comment-13963545 ] Harish Butani commented on HIVE-6863: - +1 for 0.13 HiveServer2 binary mode throws exception with PAM - Key: HIVE-6863 URL: https://issues.apache.org/jira/browse/HIVE-6863 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-6863.1.patch Works fine in http mode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6822) TestAvroSerdeUtils fails with -Phadoop-2
[ https://issues.apache.org/jira/browse/HIVE-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6822: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. TestAvroSerdeUtils fails with -Phadoop-2 Key: HIVE-6822 URL: https://issues.apache.org/jira/browse/HIVE-6822 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.14.0 Attachments: HIVE-6822.1.patch Works fine with -Phadoop-1, but with -Phadoop-2 hits the following error: {noformat} Running org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.603 sec FAILURE! - in org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils determineSchemaCanReadSchemaFromHDFS(org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils) Time elapsed: 0.688 sec ERROR! java.lang.NoClassDefFoundError: com/sun/jersey/spi/container/servlet/ServletContainer at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hadoop.http.HttpServer2.addJerseyResourcePackage(HttpServer2.java:564) at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.initWebHdfs(NameNodeHttpServer.java:84) at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:121) at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:601) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:500) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:658) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:643) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1259) at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:914) at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:805) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:663) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:603) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:474) at org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS(TestAvroSerdeUtils.java:189) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963560#comment-13963560 ] Roshan Naik commented on HIVE-5687: --- Owen: Thanks a lot for revising package.html Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6868) Create table in HCatalog sets different SerDe defaults than what is set through the CLI
Harish Butani created HIVE-6868: --- Summary: Create table in HCatalog sets different SerDe defaults than what is set through the CLI Key: HIVE-6868 URL: https://issues.apache.org/jira/browse/HIVE-6868 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Harish Butani HCatCreateTableDesc doesn't invoke the getEmptyTable function on org.apache.hadoop.hive.ql.metadata.Table -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6869) Golden file updates for tez tests.
Ashutosh Chauhan created HIVE-6869: -- Summary: Golden file updates for tez tests. Key: HIVE-6869 URL: https://issues.apache.org/jira/browse/HIVE-6869 Project: Hive Issue Type: Task Components: Tests, Tez Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6869) Golden file updates for tez tests.
[ https://issues.apache.org/jira/browse/HIVE-6869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6869: --- Attachment: HIVE-6869.patch Golden file updates for tez tests. -- Key: HIVE-6869 URL: https://issues.apache.org/jira/browse/HIVE-6869 Project: Hive Issue Type: Task Components: Tests, Tez Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6869.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6869) Golden file updates for tez tests.
[ https://issues.apache.org/jira/browse/HIVE-6869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6869: --- Status: Patch Available (was: Open) Golden file updates for tez tests. -- Key: HIVE-6869 URL: https://issues.apache.org/jira/browse/HIVE-6869 Project: Hive Issue Type: Task Components: Tests, Tez Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6869.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4790) MapredLocalTask task does not make virtual columns
[ https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963612#comment-13963612 ] Hive QA commented on HIVE-4790: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639138/HIVE-4790.7.patch.txt {color:red}ERROR:{color} -1 due to 25 failed/errored test(s), tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketizedhiveinputformat_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats11 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2180/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2180/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 25 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639138 MapredLocalTask task does not make virtual columns -- Key: HIVE-4790 URL: https://issues.apache.org/jira/browse/HIVE-4790 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D11511.3.patch, D11511.4.patch, HIVE-4790.5.patch.txt, HIVE-4790.6.patch.txt, HIVE-4790.7.patch.txt, HIVE-4790.D11511.1.patch, HIVE-4790.D11511.2.patch From mailing list, http://www.mail-archive.com/user@hive.apache.org/msg08264.html {noformat} SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; fails with this error: SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; Automatically selecting local only mode for query Total MapReduce jobs = 1 setting HADOOP_USER_NAMEpmarron 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Execution log at: /tmp/pmarron/.log 2013-06-25 10:52:56 Starting to launch local task to process map join; maximum memory = 932118528 java.lang.RuntimeException: cannot find field block__offset__inside__file from [0:rownumber, 1:offset] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168) at org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
[jira] [Updated] (HIVE-5072) [WebHCat]Enable directly invoke Sqoop job through Templeton
[ https://issues.apache.org/jira/browse/HIVE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-5072: - Attachment: HIVE-5072.3.patch Rebased the patch and add e2e test for Templeton-Sqoop action in HIVE-5072,3,patch [WebHCat]Enable directly invoke Sqoop job through Templeton --- Key: HIVE-5072 URL: https://issues.apache.org/jira/browse/HIVE-5072 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.12.0 Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5072.1.patch, HIVE-5072.2.patch, HIVE-5072.3.patch, Templeton-Sqoop-Action.pdf Now it is hard to invoke a Sqoop job through templeton. The only way is to use the classpath jar generated by a sqoop job and use the jar delegator in Templeton. We should implement Sqoop Delegator to enable directly invoke Sqoop job through Templeton. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6131) New columns after table alter result in null values despite data
[ https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963641#comment-13963641 ] Pala M Muthaia commented on HIVE-6131: -- I looked into the failures above and revisited HIVE-3833 with more context now: 1. LazyBinaryColumnarSerde requires partition level metadata to read existing data, it needs exact metadata used when serializing the data. So cannot use table level metadata which could have changed. 2. Other serdes/format, which support schema change, needs updated schema to support newly appended data with new columns. So, it seems we should pass the table metadata or partition metadata selectively, depending on what the storage/serde supports. Is there a way to identify the serdes/format that do not support newer schema, programmatically? I don't see anything obvious. Alternative is to a. Add such metadata to serde info and populate that for all serdes. This may have been discussed briefly in HIVE-3833, and looks like this will be a large change because it essentially modifies interface for a plugin. b. Hardcode a white or blacklist of serdes and pass table/partition level metadata accordingly. [~ashutoshc], [~szehon], any thoughts on the above, particularly are there other alternatives? New columns after table alter result in null values despite data Key: HIVE-6131 URL: https://issues.apache.org/jira/browse/HIVE-6131 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.12.0, 0.13.0 Reporter: James Vaughan Priority: Minor Attachments: HIVE-6131.1.patch Hi folks, I found and verified a bug on our CDH 4.0.3 install of Hive when adding columns to tables with Partitions using 'REPLACE COLUMNS'. I dug through the Jira a little bit and didn't see anything for it so hopefully this isn't just noise on the radar. Basically, when you alter a table with partitions and then reupload data to that partition, it doesn't seem to recognize the extra data that actually exists in HDFS- as in, returns NULL values on the new column despite having the data and recognizing the new column in the metadata. Here's some steps to reproduce using a basic table: 1. Run this hive command: CREATE TABLE jvaughan_test (col1 string) partitioned by (day string); 2. Create a simple file on the system with a couple of entries, something like hi and hi2 separated by newlines. 3. Run this hive command, pointing it at the file: LOAD DATA LOCAL INPATH 'FILEDIR' OVERWRITE INTO TABLE jvaughan_test PARTITION (day = '2014-01-02'); 4. Confirm the data with: SELECT * FROM jvaughan_test WHERE day = '2014-01-02'; 5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS (col1 string, col2 string); 6. Edit your file and add a second column using the default separator (ctrl+v, then ctrl+a in Vim) and add two more entries, such as hi3 on the first row and hi4 on the second 7. Run step 3 again 8. Check the data again like in step 4 For me, this is the results that get returned: hive select * from jvaughan_test where day = '2014-01-01'; OK hiNULL2014-01-02 hi2 NULL2014-01-02 This is despite the fact that there is data in the file stored by the partition in HDFS. Let me know if you need any other information. The only workaround for me currently is to drop partitions for any I'm replacing data in and THEN reupload the new data file. Thanks, -James -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6869) Golden file updates for tez tests.
[ https://issues.apache.org/jira/browse/HIVE-6869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963689#comment-13963689 ] Vikram Dixit K commented on HIVE-6869: -- Thanks Ashutosh! LGTM +1 Golden file updates for tez tests. -- Key: HIVE-6869 URL: https://issues.apache.org/jira/browse/HIVE-6869 Project: Hive Issue Type: Task Components: Tests, Tez Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6869.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view
[ https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrian Wang updated HIVE-6765: -- Status: Patch Available (was: Open) ASTNodeOrigin unserializable leads to fail when join with view -- Key: HIVE-6765 URL: https://issues.apache.org/jira/browse/HIVE-6765 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Adrian Wang Fix For: 0.13.0 Attachments: HIVE-6765.patch.1 when a view contains a UDF, and the view comes into a JOIN operation, Hive will encounter a bug with stack trace like Caused by: java.lang.InstantiationException: org.apache.hadoop.hive.ql.parse.ASTNodeOrigin at java.lang.Class.newInstance0(Class.java:359) at java.lang.Class.newInstance(Class.java:327) at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6863) HiveServer2 binary mode throws exception with PAM
[ https://issues.apache.org/jira/browse/HIVE-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963699#comment-13963699 ] Hive QA commented on HIVE-6863: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639141/HIVE-6863.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2181/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2181/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639141 HiveServer2 binary mode throws exception with PAM - Key: HIVE-6863 URL: https://issues.apache.org/jira/browse/HIVE-6863 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-6863.1.patch Works fine in http mode -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 20145: HIVE-6648 - Permissions are not inherited correctly when tables have multiple partition columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20145/ --- Review request for hive. Repository: hive-git Description --- Hive.copyFiles behaves correctly for subdirectory permission-inheritance only in case of one-level insert. To handle static partition (or any multi-directory case), I keep track of the permission of the first existing parent, and then apply it the entire sub-tree. Had to do this manually, as FileSystem.mkdir(child, perm) will only apply perm on the child itself, and not on other intermediate parents created. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestFolderPermissions.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e6cb70f Diff: https://reviews.apache.org/r/20145/diff/ Testing --- Fortunately, copyFiles uses the same code for hdfs/local case, so I was able to write a unit test to reproduce the issue. Tried to write a qfile test but did not work as 'dfs -ls' output is masked and cannot be compared, so ended up writing a junit test. Thanks, Szehon Ho
[jira] [Updated] (HIVE-6648) Permissions are not inherited correctly when tables have multiple partition columns
[ https://issues.apache.org/jira/browse/HIVE-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6648: Attachment: HIVE-6648.patch This patch fixes the issue described. After creating static partitions (insert into ... partition...), the partition directory and its intermediate-created directories now have the parent's permission. See newly-added test case. One note- the fix is in Hive.copyFile(). This JIRA also describes another problematic method (Warehouse.mkdirs()). It is actually not invoked in this particular code path, and probably I can take a look in a follow-up JIRA about it to fix cases where it is invoked. Permissions are not inherited correctly when tables have multiple partition columns --- Key: HIVE-6648 URL: https://issues.apache.org/jira/browse/HIVE-6648 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Henry Robinson Assignee: Szehon Ho Attachments: HIVE-6648.patch {{Warehouse.mkdirs()}} always looks at the immediate parent of the path that it creates when determining what permissions to inherit. However, it may have created that parent directory as well, in which case it will have the default permissions and will not have inherited them. This is a problem when performing an {{INSERT}} into a table with more than one partition column. E.g., in an empty table: {{INSERT INTO TABLE tbl PARTITION(p1=1, p2=2) ... }} A new subdirectory /p1=1/p2=2 will be created, and with permission inheritance (per HIVE-2504) enabled, the intention is presumably for both new directories to inherit the root table dir's permissions. However, {{mkdirs()}} will only set the permission of the leaf directory (i.e. /p2=2/), and then only to the permissions of /p1=1/, which was just created. {code} public boolean mkdirs(Path f) throws MetaException { FileSystem fs = null; try { fs = getFs(f); LOG.debug(Creating directory if it doesn't exist: + f); //Check if the directory already exists. We want to change the permission //to that of the parent directory only for newly created directories. if (this.inheritPerms) { try { return fs.getFileStatus(f).isDir(); } catch (FileNotFoundException ignore) { } } boolean success = fs.mkdirs(f); if (this.inheritPerms success) { // Set the permission of parent directory. // HNR: This is the bug - getParent() may refer to a just-created directory. fs.setPermission(f, fs.getFileStatus(f.getParent()).getPermission()); } return success; } catch (IOException e) { closeFs(fs); MetaStoreUtils.logAndThrowMetaException(e); } return false; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6648) Permissions are not inherited correctly when tables have multiple partition columns
[ https://issues.apache.org/jira/browse/HIVE-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6648: Affects Version/s: 0.13.0 Status: Patch Available (was: Open) Permissions are not inherited correctly when tables have multiple partition columns --- Key: HIVE-6648 URL: https://issues.apache.org/jira/browse/HIVE-6648 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0 Reporter: Henry Robinson Assignee: Szehon Ho Attachments: HIVE-6648.patch {{Warehouse.mkdirs()}} always looks at the immediate parent of the path that it creates when determining what permissions to inherit. However, it may have created that parent directory as well, in which case it will have the default permissions and will not have inherited them. This is a problem when performing an {{INSERT}} into a table with more than one partition column. E.g., in an empty table: {{INSERT INTO TABLE tbl PARTITION(p1=1, p2=2) ... }} A new subdirectory /p1=1/p2=2 will be created, and with permission inheritance (per HIVE-2504) enabled, the intention is presumably for both new directories to inherit the root table dir's permissions. However, {{mkdirs()}} will only set the permission of the leaf directory (i.e. /p2=2/), and then only to the permissions of /p1=1/, which was just created. {code} public boolean mkdirs(Path f) throws MetaException { FileSystem fs = null; try { fs = getFs(f); LOG.debug(Creating directory if it doesn't exist: + f); //Check if the directory already exists. We want to change the permission //to that of the parent directory only for newly created directories. if (this.inheritPerms) { try { return fs.getFileStatus(f).isDir(); } catch (FileNotFoundException ignore) { } } boolean success = fs.mkdirs(f); if (this.inheritPerms success) { // Set the permission of parent directory. // HNR: This is the bug - getParent() may refer to a just-created directory. fs.setPermission(f, fs.getFileStatus(f.getParent()).getPermission()); } return success; } catch (IOException e) { closeFs(fs); MetaStoreUtils.logAndThrowMetaException(e); } return false; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6825) custom jars for Hive query should be uploaded to scratch dir per query; and/or versioned
[ https://issues.apache.org/jira/browse/HIVE-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6825: --- Fix Version/s: (was: 0.14.0) 0.13.0 custom jars for Hive query should be uploaded to scratch dir per query; and/or versioned Key: HIVE-6825 URL: https://issues.apache.org/jira/browse/HIVE-6825 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.13.0 Attachments: HIVE-6825.01.patch, HIVE-6825.patch Currently the jars are uploaded to either user directory or global, whatever is configured, which is a mess and can cause collisions. We can upload to scratch directory, and/or version. There's a tradeoff between having to upload files every time (for example, for commonly used things like HBase input format) (which is what is done now, into global/user path), and having a mess of one-off custom jars and files, versioned, sitting in .hiveJars. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 19754: Defines a api for streaming data into Hive using ACID support.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19754/#review39817 --- hcatalog/streaming/pom.xml https://reviews.apache.org/r/19754/#comment72461 typo: artifectId should be artifactId hcatalog/streaming/pom.xml https://reviews.apache.org/r/19754/#comment72462 typo: artifectId should be artifactId hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java https://reviews.apache.org/r/19754/#comment72463 suggestion for Txnid: either spell out transaction (transaction ID -- preferable) or use capital I like the parameter (TxnId) hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java https://reviews.apache.org/r/19754/#comment72464 Why does the parameter name have both-caps ID for maxTxnID while it's init-cap Id for minTxnId? Are parameter names case-sensitive? Also a suggestion for Txnid in description: either spell out transaction (transaction ID -- preferable) or use capital ID like the parameter (TxnID). hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java https://reviews.apache.org/r/19754/#comment72465 Same question as line 108 about minTxnId vs maxTxnID capitalization hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72520 Nit: period at the end (next line too) hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72466 Editorial nits: Please capitalize nulls and end the second sentence with a period (next line) just for consistency with the first sentence. hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72467 Grammar nit: Remove s from indicates because the subjects are plural. hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72468 Consistency nit: Since other param descriptions are capitalized on the first word, please do the same here. Bonus points if you capitalize all the param descriptions in this patch, but I'm not going to comment on all of them. You could argue for a rule that only capitalizes full sentences and proper nouns like Hive, in which case [pun alert] it's okay to leave input uncapitalized. But I favor visual consistency over rule consistency, except when I'm inconsistent. Terminal periods aren't essential (given the typical style of javadocs) but they're recommended when a description has multiple sentences. Hm, but that's inconsistent with my visual consistency preference. Why am I wasting your time with this trivia? hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72517 should endpoint be explained? (your call) hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72469 Editorial nit: non existing seems okay in this context, but nonexistent is the real word (your choice). Consistency nit again: Since other exception descriptions are capitalized on the first word, please do the same here. hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72470 ditto line 57 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72471 ditto line 58 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72472 ditto line 59 (capitalization) hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72474 ditto line 60 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72473 Hive nit: please capitalize hive Editorial nits: please capitalize a and perhaps spell out configuration in conf object unless conf is the proper term for the object hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72475 ditto line 65 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java https://reviews.apache.org/r/19754/#comment72516 ditto line 59 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java
[jira] [Updated] (HIVE-6862) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server
[ https://issues.apache.org/jira/browse/HIVE-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6862: - Description: need to add a unifed 0.13 script and a separate script for ACID support NO PRECOMMIT TESTS was:need to add a unifed 0.13 script and a separate script for ACID support add DB schema DDL and upgrade 12to13 scripts for MS SQL Server -- Key: HIVE-6862 URL: https://issues.apache.org/jira/browse/HIVE-6862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-6862.patch need to add a unifed 0.13 script and a separate script for ACID support NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6862) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server
[ https://issues.apache.org/jira/browse/HIVE-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6862: - Attachment: HIVE-6862.patch add DB schema DDL and upgrade 12to13 scripts for MS SQL Server -- Key: HIVE-6862 URL: https://issues.apache.org/jira/browse/HIVE-6862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-6862.patch need to add a unifed 0.13 script and a separate script for ACID support -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6862) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server
[ https://issues.apache.org/jira/browse/HIVE-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6862: - Status: Patch Available (was: Open) add DB schema DDL and upgrade 12to13 scripts for MS SQL Server -- Key: HIVE-6862 URL: https://issues.apache.org/jira/browse/HIVE-6862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-6862.patch need to add a unifed 0.13 script and a separate script for ACID support NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6825) custom jars for Hive query should be uploaded to scratch dir per query; and/or versioned
[ https://issues.apache.org/jira/browse/HIVE-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6825: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed to trunk and 13. Resolved a trivial conflict on commit custom jars for Hive query should be uploaded to scratch dir per query; and/or versioned Key: HIVE-6825 URL: https://issues.apache.org/jira/browse/HIVE-6825 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.13.0 Attachments: HIVE-6825.01.patch, HIVE-6825.patch Currently the jars are uploaded to either user directory or global, whatever is configured, which is a mess and can cause collisions. We can upload to scratch directory, and/or version. There's a tradeoff between having to upload files every time (for example, for commonly used things like HBase input format) (which is what is done now, into global/user path), and having a mess of one-off custom jars and files, versioned, sitting in .hiveJars. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6131) New columns after table alter result in null values despite data
[ https://issues.apache.org/jira/browse/HIVE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963728#comment-13963728 ] Szehon Ho commented on HIVE-6131: - Hm I understand file-format may differ between partition and table, that was the point of HIVE-3833. But just for my understanding, did you find any use for the partition-columns being different from table-columns (being the original)? In my experience, I had seen that LazyBinaryColumnarSerde (and other serde) can use a schema with more columns to de-serialize data than what the data was written with. If thats the case, cant we make the column set same for partition and table, during 'alter table'? New columns after table alter result in null values despite data Key: HIVE-6131 URL: https://issues.apache.org/jira/browse/HIVE-6131 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.12.0, 0.13.0 Reporter: James Vaughan Priority: Minor Attachments: HIVE-6131.1.patch Hi folks, I found and verified a bug on our CDH 4.0.3 install of Hive when adding columns to tables with Partitions using 'REPLACE COLUMNS'. I dug through the Jira a little bit and didn't see anything for it so hopefully this isn't just noise on the radar. Basically, when you alter a table with partitions and then reupload data to that partition, it doesn't seem to recognize the extra data that actually exists in HDFS- as in, returns NULL values on the new column despite having the data and recognizing the new column in the metadata. Here's some steps to reproduce using a basic table: 1. Run this hive command: CREATE TABLE jvaughan_test (col1 string) partitioned by (day string); 2. Create a simple file on the system with a couple of entries, something like hi and hi2 separated by newlines. 3. Run this hive command, pointing it at the file: LOAD DATA LOCAL INPATH 'FILEDIR' OVERWRITE INTO TABLE jvaughan_test PARTITION (day = '2014-01-02'); 4. Confirm the data with: SELECT * FROM jvaughan_test WHERE day = '2014-01-02'; 5. Alter the column definitions: ALTER TABLE jvaughan_test REPLACE COLUMNS (col1 string, col2 string); 6. Edit your file and add a second column using the default separator (ctrl+v, then ctrl+a in Vim) and add two more entries, such as hi3 on the first row and hi4 on the second 7. Run step 3 again 8. Check the data again like in step 4 For me, this is the results that get returned: hive select * from jvaughan_test where day = '2014-01-01'; OK hiNULL2014-01-02 hi2 NULL2014-01-02 This is despite the fact that there is data in the file stored by the partition in HDFS. Let me know if you need any other information. The only workaround for me currently is to drop partitions for any I'm replacing data in and THEN reupload the new data file. Thanks, -James -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 19903: Support bulk deleting directories for partition drop with partial spec
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19903/ --- (Updated April 9, 2014, 3:26 a.m.) Review request for hive. Changes --- Fixed test fail Bugs: HIVE-6809 https://issues.apache.org/jira/browse/HIVE-6809 Repository: hive-git Description --- In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time. Diffs (updated) - hcatalog/core/src/main/java/org/apache/hcatalog/cli/SemanticAnalysis/HCatSemanticAnalyzer.java d348b9b metastore/if/hive_metastore.thrift eef1b80 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874 metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57 metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 18e62d8 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0 metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java f731dab metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1 metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e6cb70f ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a40a88d ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION ql/src/test/results/clientnegative/drop_partition_failure.q.out cde0abb ql/src/test/results/clientnegative/drop_partition_filter_failure.q.out c4f533b ql/src/test/results/clientpositive/drop_multi_partitions.q.out eae57f3 ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION Diff: https://reviews.apache.org/r/19903/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963757#comment-13963757 ] Hive QA commented on HIVE-3972: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639142/HIVE-3972.8.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orderby_query_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.testExecuteStatementAsync {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2182/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2182/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12639142 Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, HIVE-3972.8.patch.txt, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6853) show create table for hbase tables should exclude LOCATION
[ https://issues.apache.org/jira/browse/HIVE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Christine updated HIVE-6853: --- Attachment: HIVE-6853.patch bq: is it needed to make a StringBuilder when there is only one string to return? Fixed. I removed it and just returned the string. show create table for hbase tables should exclude LOCATION --- Key: HIVE-6853 URL: https://issues.apache.org/jira/browse/HIVE-6853 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.10.0 Reporter: Miklos Christine Attachments: HIVE-6853-0.patch, HIVE-6853.patch If you create a table on top of hbase in hive and issue a show create table hbase_table, it gives a bad DDL. It should not show LOCATION: [hive]$ cat /tmp/test_create.sql CREATE EXTERNAL TABLE nba_twitter.hbase2( key string COMMENT 'from deserializer', name string COMMENT 'from deserializer', pdt string COMMENT 'from deserializer', service string COMMENT 'from deserializer', term string COMMENT 'from deserializer', update1 string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format'='1', 'hbase.columns.mapping'=':key,srv:name,srv:pdt,srv:service,srv:term,srv:update') LOCATION 'hdfs://nameservice1/user/hive/warehouse/nba_twitter.db/hbase' TBLPROPERTIES ( 'hbase.table.name'='NBATwitter', 'transient_lastDdlTime'='1386172188') Trying to create a table using the above fails: [hive]$ hive -f /tmp/test_create.sql cli -f /tmp/test_create.sql Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties FAILED: Error in metadata: MetaException(message:LOCATION may not be specified for HBase.) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask However, if I remove the LOCATION, then the DDL is valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3972: Attachment: HIVE-3972.9.patch.txt Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, HIVE-3972.8.patch.txt, HIVE-3972.9.patch.txt, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4790) MapredLocalTask task does not make virtual columns
[ https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4790: Status: Open (was: Patch Available) Things are fucked-up rebasing on trunk. We should skip this in hive-0.13.0. MapredLocalTask task does not make virtual columns -- Key: HIVE-4790 URL: https://issues.apache.org/jira/browse/HIVE-4790 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D11511.3.patch, D11511.4.patch, HIVE-4790.5.patch.txt, HIVE-4790.6.patch.txt, HIVE-4790.7.patch.txt, HIVE-4790.D11511.1.patch, HIVE-4790.D11511.2.patch From mailing list, http://www.mail-archive.com/user@hive.apache.org/msg08264.html {noformat} SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; fails with this error: SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; Automatically selecting local only mode for query Total MapReduce jobs = 1 setting HADOOP_USER_NAMEpmarron 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Execution log at: /tmp/pmarron/.log 2013-06-25 10:52:56 Starting to launch local task to process map join; maximum memory = 932118528 java.lang.RuntimeException: cannot find field block__offset__inside__file from [0:rownumber, 1:offset] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168) at org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68) at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Execution failed with exit status: 2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results
[ https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963766#comment-13963766 ] Brock Noland commented on HIVE-3972: Looks like the .out file contains a ^A or something: diff --git ql/src/test/results/clientpositive/orderby_query_bucketing.q.out ql/src/test/results/clientpositive/orderby_query_bucketing.q.out new file mode 100644 index 000..c02b1c9 Binary files /dev/null and ql/src/test/results/clientpositive/orderby_query_bucketing.q.out differ Support using multiple reducer for fetching order by results Key: HIVE-3972 URL: https://issues.apache.org/jira/browse/HIVE-3972 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D8349.5.patch, D8349.6.patch, D8349.7.patch, HIVE-3972.8.patch.txt, HIVE-3972.9.patch.txt, HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch Queries for fetching results which have lastly order by clause make final MR run with single reducer, which can be too much. For example, {code} select value, sum(key) as sum from src group by value order by sum; {code} If number of reducer is reasonable, multiple result files could be merged into single sorted stream in the fetcher level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6870) Fix maven.repo.local setting in Hive build
Jason Dere created HIVE-6870: Summary: Fix maven.repo.local setting in Hive build Key: HIVE-6870 URL: https://issues.apache.org/jira/browse/HIVE-6870 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Jason Dere Assignee: Jason Dere The pom.xml currently assumes maven.repo.local should be ${user.home}/.m2/repository. If the user has overridden the local repository through Maven settings, tests which assume the hive-exec JAR is at ${user.home}/.m2/repository will fail because the artifacts will not be installed at that location. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6870) Fix maven.repo.local setting in Hive build
[ https://issues.apache.org/jira/browse/HIVE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6870: - Status: Patch Available (was: Open) Fix maven.repo.local setting in Hive build -- Key: HIVE-6870 URL: https://issues.apache.org/jira/browse/HIVE-6870 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6870.1.patch The pom.xml currently assumes maven.repo.local should be ${user.home}/.m2/repository. If the user has overridden the local repository through Maven settings, tests which assume the hive-exec JAR is at ${user.home}/.m2/repository will fail because the artifacts will not be installed at that location. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6870) Fix maven.repo.local setting in Hive build
[ https://issues.apache.org/jira/browse/HIVE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-6870: - Attachment: HIVE-6870.1.patch Use ${settings.localRepository} for the maven.repo.local property. Fix maven.repo.local setting in Hive build -- Key: HIVE-6870 URL: https://issues.apache.org/jira/browse/HIVE-6870 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6870.1.patch The pom.xml currently assumes maven.repo.local should be ${user.home}/.m2/repository. If the user has overridden the local repository through Maven settings, tests which assume the hive-exec JAR is at ${user.home}/.m2/repository will fail because the artifacts will not be installed at that location. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6853) show create table for hbase tables should exclude LOCATION
[ https://issues.apache.org/jira/browse/HIVE-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963806#comment-13963806 ] Szehon Ho commented on HIVE-6853: - Thanks , for the most part it LGTM. I guess its not the cleanest, as its breaking the StorageHandler abstraction. Probably cleaner to add some hook to StorageHandler interface, but due to backward compatibility, its probably not worth it for this use-case. +1 (non-binding) show create table for hbase tables should exclude LOCATION --- Key: HIVE-6853 URL: https://issues.apache.org/jira/browse/HIVE-6853 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.10.0 Reporter: Miklos Christine Attachments: HIVE-6853-0.patch, HIVE-6853.patch If you create a table on top of hbase in hive and issue a show create table hbase_table, it gives a bad DDL. It should not show LOCATION: [hive]$ cat /tmp/test_create.sql CREATE EXTERNAL TABLE nba_twitter.hbase2( key string COMMENT 'from deserializer', name string COMMENT 'from deserializer', pdt string COMMENT 'from deserializer', service string COMMENT 'from deserializer', term string COMMENT 'from deserializer', update1 string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format'='1', 'hbase.columns.mapping'=':key,srv:name,srv:pdt,srv:service,srv:term,srv:update') LOCATION 'hdfs://nameservice1/user/hive/warehouse/nba_twitter.db/hbase' TBLPROPERTIES ( 'hbase.table.name'='NBATwitter', 'transient_lastDdlTime'='1386172188') Trying to create a table using the above fails: [hive]$ hive -f /tmp/test_create.sql cli -f /tmp/test_create.sql Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hive/lib/hive-common-0.10.0-cdh4.4.0.jar!/hive-log4j.properties FAILED: Error in metadata: MetaException(message:LOCATION may not be specified for HBase.) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask However, if I remove the LOCATION, then the DDL is valid. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963820#comment-13963820 ] Roshan Naik commented on HIVE-5687: --- I had posted the revised patch on RB Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Timeline for the Hive 0.13 release?
Hi, We are getting close to having all the issues resolved. I have the following list of open jiras as needing to go into 0.13 6863, 5687, 6604,6850,6818,6732,4904,5376, and 6319. These are all close to being committed. Some are waiting for the 24hr period, a couple have been reviewed and +1ed, waiting for tests to pass. So lets shoot for closing out all these issues by Thursday 12pm PST. Would like to cut an rc by Thursday afternoon. regards, Harish. On Mar 26, 2014, at 7:14 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Hi Harish Can we have the following bugs for 0.13? These bugs are related to feature HIVE-6455 added as part of 0.13. https://issues.apache.org/jira/browse/HIVE-6748 (Resource leak bug) https://issues.apache.org/jira/browse/HIVE-6760 (Bug in handling list bucketing) https://issues.apache.org/jira/browse/HIVE-6761 (Bug with hashcodes generation) Thanks Prasanth Jayachandran On Mar 26, 2014, at 1:22 PM, Hari Subramaniyan hsubramani...@hortonworks.com wrote: Hi Harish Can you include HIVE-6708. It covers quite a number of issues associated with Vectorization(including some correctness issues and exceptions). Thanks Hari On Tue, Mar 25, 2014 at 12:01 PM, Xuefu Zhang xzh...@cloudera.com wrote: Harish, Could we include HIVE-6740? Thanks, Xuefu On Thu, Mar 20, 2014 at 7:27 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Harish, Could you add the following bugs as well? Following are related to LazyMap bug https://issues.apache.org/jira/browse/HIVE-6707 https://issues.apache.org/jira/browse/HIVE-6714 https://issues.apache.org/jira/browse/HIVE-6711 Following is NPE bug with orc struct https://issues.apache.org/jira/browse/HIVE-6716 Thanks Prasanth Jayachandran On Mar 14, 2014, at 6:26 PM, Eugene Koifman ekoif...@hortonworks.com wrote: could you add https://issues.apache.org/jira/browse/HIVE-6676 please. It's a blocker as well. Thanks, Eugene On Fri, Mar 14, 2014 at 5:30 PM, Vaibhav Gumashta vgumas...@hortonworks.com wrote: Harish, Can we have this in as well: https://issues.apache.org/jira/browse/HIVE-6660. Blocker bug in my opinion. Thanks, --Vaibhav On Fri, Mar 14, 2014 at 2:21 PM, Thejas Nair the...@hortonworks.com wrote: Harish, Can you also include HIVE-6673 https://issues.apache.org/jira/browse/HIVE-6673 - show grant statement for all principals throws NPE This variant of 'show grant' is very useful, and the fix for NPE is straightforward. It is patch available now. On Fri, Mar 14, 2014 at 10:25 AM, Yin Huai huaiyin@gmail.com wrote: Guys, Seems ConditionalResolverCommonJoin is not working correctly? I created https://issues.apache.org/jira/browse/HIVE-6668 and set it as a blocker. thanks, Yin On Fri, Mar 14, 2014 at 11:34 AM, Thejas Nair the...@hortonworks.com wrote: Can you also add HIVE-6647 https://issues.apache.org/jira/browse/HIVE-6647 to the list? It is marked as a blocker for 0.13. It has a necessary version number upgrade for HS2. It is ready to be committed. On Fri, Mar 14, 2014 at 12:38 AM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Harish Can you please make the following changes to my earlier request? HIVE-4177 is not required.. instead the same work is tracked under HIVE-6578. Can you also consider HIVE-6656? HIVE-6656 is bug fix for ORC reader when reading timestamp nanoseconds. This bug exists in earlier versions as well, so it will be good have this fixed in 0.13.0 Thanks Prasanth Jayachandran On Mar 13, 2014, at 8:52 AM, Thejas Nair the...@hortonworks.com wrote: Harish, I think we should include the following - HIVE-6547 - This is a cleanup of metastore api changes introduced in 0.13 . This can't be done post release. I will get a patch out in few hours. HIVE-6567 - fixes a NPE in 'show grant .. on all HIVE-6629 - change in syntax for 'set role none' . marked as a blocker bug. On Tue, Mar 11, 2014 at 8:39 AM, Harish Butani hbut...@hortonworks.com wrote: yes sure. On Mar 10, 2014, at 3:55 PM, Gopal V gop...@apache.org wrote: Can I add HIVE-6518 as well to the merge queue on https://cwiki.apache.org/confluence/display/Hive/Hive+0.13+release+status It is a relatively simple OOM safety patch to vectorized group-by. Tests pass locally for vec group-by, but the pre-commit tests haven't fired eventhough it's been PA for a while now. Cheers, Gopal -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of