[jira] [Created] (HIVE-25757) Use cached database type to choose metastore backend queries
Yongzhi Chen created HIVE-25757: --- Summary: Use cached database type to choose metastore backend queries Key: HIVE-25757 URL: https://issues.apache.org/jira/browse/HIVE-25757 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 4.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen In HIVE-21075, we use DatabaseProduct.determineDatabaseProduct which can be expensive. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25238) Make excluded SSL cipher suites configurable for Hive Web UI and HS2
Yongzhi Chen created HIVE-25238: --- Summary: Make excluded SSL cipher suites configurable for Hive Web UI and HS2 Key: HIVE-25238 URL: https://issues.apache.org/jira/browse/HIVE-25238 Project: Hive Issue Type: Improvement Components: HiveServer2, Web UI Reporter: Yongzhi Chen When starting a jetty http server, one can explicitly exclude certain (unsecure) SSL cipher suites. This can be especially important, when Hive needs to be compliant with security regulations. Need add properties to support Hive WebUi and HiveServer2 to this -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25211) Create database throws NPE
Yongzhi Chen created HIVE-25211: --- Summary: Create database throws NPE Key: HIVE-25211 URL: https://issues.apache.org/jira/browse/HIVE-25211 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 4.0.0 Reporter: Yongzhi Chen <11>1 2021-06-06T17:32:48.964Z metastore-0.metastore-service.warehouse-1622998329-9klr.svc.cluster.local metastore 1 5ad83e8e-bf89-4ad3-b1fb-51c73c7133b7 [mdc@18060 class="metastore.RetryingHMSHandler" level="ERROR" thread="pool-9-thread-16"] MetaException(message:java.lang.NullPointerException) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:8115) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1629) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121) at com.sun.proxy.$Proxy31.create_database(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16795) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:16779) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:120) at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:128) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:491) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:480) at org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:476) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1556) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$9.run(HiveMetaStore.java:1554) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database_core(HiveMetaStore.java:1554) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1618) ... 21 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24552) Possible HMS connections leak or accumulation in loadDynamicPartitions
Yongzhi Chen created HIVE-24552: --- Summary: Possible HMS connections leak or accumulation in loadDynamicPartitions Key: HIVE-24552 URL: https://issues.apache.org/jira/browse/HIVE-24552 Project: Hive Issue Type: Bug Components: Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen When loadDynamicPartitions (Hive.java) is called, it generates several threads to handle FileMove. These threads may generate HiveMetaStore connections. These connections may not be closed in time and cause many accumulated connections. Following is the log got from running insert overwrites many times, you can see these threads created new HMS connections, and the total number of open connections is large. And the finalizer closes the connections and sometimes had errors: {noformat} <14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-14"] Opened a connection to metastore, current connections: 44021 <14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-14"] Connected to metastore. <14>1 2020-12-15T17:06:15.894Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.RetryingMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-14"] RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 lifetime=0 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-5"] Opened a connection to metastore, current connections: 44022 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-5"] Connected to metastore. <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.RetryingMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-5"] RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 lifetime=0 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-6"] Opened a connection to metastore, current connections: 44023 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-6"] Connected to metastore. <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.RetryingMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-6"] RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive/dwx-env-mdr...@halxg.cloudera.com (auth:KERBEROS) retries=24 delay=5 lifetime=0 <14>1 2020-12-15T17:06:15.895Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="load-dynamic-partitionsToAdd-3"] Opened a connection to metastore, current connections: 44024 <14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43904 <14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 43903 <14>1 2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a con
[jira] [Created] (HIVE-24392) Send table id in get_parttions_by_names_req api
Yongzhi Chen created HIVE-24392: --- Summary: Send table id in get_parttions_by_names_req api Key: HIVE-24392 URL: https://issues.apache.org/jira/browse/HIVE-24392 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen Table id is not part of the get_partitions_by_names_req API thrift definition, add it by this Jira -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24292) hive webUI should support keystoretype by config
Yongzhi Chen created HIVE-24292: --- Summary: hive webUI should support keystoretype by config Key: HIVE-24292 URL: https://issues.apache.org/jira/browse/HIVE-24292 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Yongzhi Chen Assignee: Yongzhi Chen We need a property to pass-in keystore type in webui too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24253) HMS needs to support keystore/truststores types besides JKS
Yongzhi Chen created HIVE-24253: --- Summary: HMS needs to support keystore/truststores types besides JKS Key: HIVE-24253 URL: https://issues.apache.org/jira/browse/HIVE-24253 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support the default keystore type specified for the JDK and not always use JKS. Same as HIVE-23958 for hive, HMS should support to set additional keystore/truststore types used for different applications like for FIPS crypto algorithms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24236) Connection leak in TxnHandler
Yongzhi Chen created HIVE-24236: --- Summary: Connection leak in TxnHandler Key: HIVE-24236 URL: https://issues.apache.org/jira/browse/HIVE-24236 Project: Hive Issue Type: Bug Components: Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen We see failures in QE tests with cannot allocate connections errors. The exception stack like following: {noformat} 2020-09-29T18:44:26,563 INFO [Heartbeater-0]: txn.TxnHandler (TxnHandler.java:checkRetryable(3733)) - Non-retryable error in heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, general error (SQLState=null, ErrorCode=0) 2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to select from transaction database org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, general error at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118) at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605) at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452) at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy63.heartbeat(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247) at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213) at com.sun.proxy.$Proxy64.heartbeat(Unknown Source) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1112) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) ... 29 more ) at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2747) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452) at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source) {noformat} and {noformat} Caused by: java.util.NoSuchElementException: Timeout waiting for idle object at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1134) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) ... 53 more ) at org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:3375) at org.apache.hadoop.hive.metastore.AcidEventListener.onDropTable(AcidEventListener.java:65) at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$19.notify(MetaStoreListenerNotifier.java:103) at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent
[jira] [Created] (HIVE-22461) NPE Metastore Transformer
Yongzhi Chen created HIVE-22461: --- Summary: NPE Metastore Transformer Key: HIVE-22461 URL: https://issues.apache.org/jira/browse/HIVE-22461 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.1.2 Reporter: Yongzhi Chen Assignee: Yongzhi Chen The stack looks as following: {noformat} 2019-10-08 18:09:12,198 INFO org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: [pool-6-thread-328]: Starting translation for processor Hiveserver2#3.1.2000.7.0.2.0...@vc0732.halxg.cloudera.com on list 1 2019-10-08 18:09:12,198 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-328]: java.lang.NullPointerException at org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transform(MetastoreDefaultTransformer.java:99) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3391) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3352) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy28.get_table_req(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16633) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16617) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-10-08 18:09:12,199 ERROR org.apache.thrift.server.TThreadPoolServer: [pool-6-thread-328]: Error occurred during processing of message. java.lang.NullPointerException: null at org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transform(MetastoreDefaultTransformer.java:99) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3391) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3352) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_141] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at com.sun.proxy.$Proxy28.get_table_req(Unknown Source) ~[?:?] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16633) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_req.getResult(ThriftHiveMetastore.java:16617) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-3.1.2000.7.0.2.0-59.jar:3.1.2000.7.0.2.0-59
[jira] [Created] (HIVE-21840) Hive Metastore Translation: Bucketed table Readonly capability
Yongzhi Chen created HIVE-21840: --- Summary: Hive Metastore Translation: Bucketed table Readonly capability Key: HIVE-21840 URL: https://issues.apache.org/jira/browse/HIVE-21840 Project: Hive Issue Type: New Feature Reporter: Yongzhi Chen Assignee: Naveen Gangam Impala needs a new capability to tell only read supported for bucketed tables. No matter it is managed or external, ACID or not. Also in the current implementation, when HIVEBUCKET2 is not in the capabilities list, a bucked external table returned as an un-bucketed one, we need a way to know it is "downgraded" from a bucketed table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21839) Hive Metastore Translation: Hive need to create a type of table if the client does not have the write capability for it
Yongzhi Chen created HIVE-21839: --- Summary: Hive Metastore Translation: Hive need to create a type of table if the client does not have the write capability for it Key: HIVE-21839 URL: https://issues.apache.org/jira/browse/HIVE-21839 Project: Hive Issue Type: New Feature Reporter: Yongzhi Chen Assignee: Naveen Gangam Hive can either return an error message or provide an API call to check the permission even without a table instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21838) Hive Metastore Translation: Add API call to tell client why table has limited access
Yongzhi Chen created HIVE-21838: --- Summary: Hive Metastore Translation: Add API call to tell client why table has limited access Key: HIVE-21838 URL: https://issues.apache.org/jira/browse/HIVE-21838 Project: Hive Issue Type: New Feature Reporter: Yongzhi Chen Assignee: Naveen Gangam When a table access type is Read-only or None, we need a way to tell clients why. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69672: HIVE-21045: Add total API timing stats and connection pool stats to metrics
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69672/#review211886 --- Ship it! Ship It! - Yongzhi Chen On Jan. 5, 2019, 12:41 a.m., Karthik Manamcheri wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69672/ > --- > > (Updated Jan. 5, 2019, 12:41 a.m.) > > > Review request for hive, Adam Holley, Morio Ramdenbourg, Naveen Gangam, and > Vihang Karajgaonkar. > > > Repository: hive-git > > > Description > --- > > HIVE-21045: Add total API timing stats and connection pool stats to metrics > > > Diffs > - > > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java > dfd7abff85 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java > 7e33c519a8 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProvider.java > 6dc63fb3bc > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProviderFactory.java > 5a92e104be > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java > 7fe487b184 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java > 8f6ae57e36 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java > 3b188f83af > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/PerfLogger.java > a2def26fc5 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 2a6290315a > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java > 6ae7f50471 > > > Diff: https://reviews.apache.org/r/69672/diff/1/ > > > Testing > --- > > Manual testing to verify that the new metrics show up for hikaricp, bonecp, > and also the total stats. Here are samples of > 1. [HikariCP json metrics > sample](https://gist.github.com/kmanamcheri/48ff2a680e85c7e925a6f95a9384dcef) > 2. [BoneCP json metrics > sample](https://gist.github.com/kmanamcheri/b005f68263a1a1be06b25156a159d975) > > In both the reports note that there are pool gauges (for tracking the > connection pool info) and also a timer for total api calls. > > > Thanks, > > Karthik Manamcheri > >
Re: Review Request 69672: HIVE-21045: Add total API timing stats and connection pool stats to metrics
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69672/#review211805 --- standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java Line 228 (original), 227 (patched) <https://reviews.apache.org/r/69672/#comment297398> This is a little bit different from the old impl funtionally : The old impl will return null if there is no custom properties, where new impl will still return the provider. The old impl has a kind of sanity check. But if the custom properties are not required here, it should be fine. standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java Lines 101 (patched) <https://reviews.apache.org/r/69672/#comment297397> If registry is null, should we give a warning in the log? - Yongzhi Chen On Jan. 5, 2019, 12:41 a.m., Karthik Manamcheri wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69672/ > --- > > (Updated Jan. 5, 2019, 12:41 a.m.) > > > Review request for hive, Adam Holley, Morio Ramdenbourg, Naveen Gangam, and > Vihang Karajgaonkar. > > > Repository: hive-git > > > Description > --- > > HIVE-21045: Add total API timing stats and connection pool stats to metrics > > > Diffs > - > > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java > dfd7abff85 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/BoneCPDataSourceProvider.java > 7e33c519a8 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProvider.java > 6dc63fb3bc > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DataSourceProviderFactory.java > 5a92e104be > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java > 7fe487b184 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java > 8f6ae57e36 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/MetricsConstants.java > 3b188f83af > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/PerfLogger.java > a2def26fc5 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 2a6290315a > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java > 6ae7f50471 > > > Diff: https://reviews.apache.org/r/69672/diff/1/ > > > Testing > --- > > Manual testing to verify that the new metrics show up for hikaricp, bonecp, > and also the total stats. Here are samples of > 1. [HikariCP json metrics > sample](https://gist.github.com/kmanamcheri/48ff2a680e85c7e925a6f95a9384dcef) > 2. [BoneCP json metrics > sample](https://gist.github.com/kmanamcheri/b005f68263a1a1be06b25156a159d975) > > In both the reports note that there are pool gauges (for tracking the > connection pool info) and also a timer for total api calls. > > > Thanks, > > Karthik Manamcheri > >
[jira] [Created] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB
Yongzhi Chen created HIVE-21075: --- Summary: Metastore: Drop partition performance downgrade with Postgres DB Key: HIVE-21075 URL: https://issues.apache.org/jira/browse/HIVE-21075 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 3.0.0 Reporter: Yongzhi Chen In order to workaround oracle not supporting limit statement caused performance issue, HIVE-9447 makes all the backend DB run select count(1) from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in SDS table before drop a partition. This select count(1) statement does not scale well in Postgres, and there is no index for CD_ID column in SDS table. For a SDS table with with 1.5 million rows, select count(1) has average 700ms without index, while in 10-20ms with index. But the statement before HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses less than 10ms . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21019) Fix autoColumnStats tests to make auto stats gather possible.
Yongzhi Chen created HIVE-21019: --- Summary: Fix autoColumnStats tests to make auto stats gather possible. Key: HIVE-21019 URL: https://issues.apache.org/jira/browse/HIVE-21019 Project: Hive Issue Type: Bug Components: Test Affects Versions: 4.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Before https://issues.apache.org/jira/browse/HIVE-20915 , the optimizer sort dynamic partitions is turn off for these tests. So these test can have group by in the query plan which can trigger compute statistics. After the jira, the optimizer is enabled, the query plan do not have group by, but a reduce sorting operation. In order to test the auto column stats gather feature, we should disable sort dynamic partitions for these tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20915) Make dynamic sort partition optimization available to HoS and MR
Yongzhi Chen created HIVE-20915: --- Summary: Make dynamic sort partition optimization available to HoS and MR Key: HIVE-20915 URL: https://issues.apache.org/jira/browse/HIVE-20915 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 4.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen HIVE-20703 put dynamic sort partition optimization under cost based decision, but it also makes the optimizer only available to tez. hive.optimize.sort.dynamic.partition works with other execution engines for a long time, we should keep the optimizer available to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20741) Disable or fix random failed tests
Yongzhi Chen created HIVE-20741: --- Summary: Disable or fix random failed tests Key: HIVE-20741 URL: https://issues.apache.org/jira/browse/HIVE-20741 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Two qfile tests for TestCliDriver, they may all relate to number precision issues: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_context_ngrams] (batchId=79) Error: Client Execution succeeded but contained differences (error code = 1) after executing udaf_context_ngrams.q 43c43 < [{"ngram":["travelling"],"estfrequency":1.0}] --- > [{"ngram":["travelling"],"estfrequency":3.0}] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_corr] (batchId=84) Client Execution succeeded but contained differences (error code = 1) after executing udaf_corr.q 100c100 < 0.6633880657639324 --- > 0.6633880657639326 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20695) HoS Query fails with hive.exec.parallel=true
Yongzhi Chen created HIVE-20695: --- Summary: HoS Query fails with hive.exec.parallel=true Key: HIVE-20695 URL: https://issues.apache.org/jira/browse/HIVE-20695 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.2.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Hive queries which fail when running a HiveOnSpark job: {noformat} ERROR : Failed to execute spark task, with exception 'java.lang.Exception(Failed to submit Spark work, please retry later)' java.lang.Exception: Failed to submit Spark work, please retry later at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.execute(RemoteHiveSparkClient.java:186) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:71) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:107) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:99) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive/dbname/_spark_session_dir/e202c452-8793-4e4e-ad55-61e3d4965c69/somename.jar (inode 725730760): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-1981084042_486659, pending creates: 7] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3755) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3556) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3412) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:688) {format} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20016) Investigate random test failure
Yongzhi Chen created HIVE-20016: --- Summary: Investigate random test failure Key: HIVE-20016 URL: https://issues.apache.org/jira/browse/HIVE-20016 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 4.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation3 failed with: java.lang.AssertionError: Concurrent Statement failed: org.apache.hive.service.cli.HiveSQLException: java.lang.AssertionError: Authorization plugins not initialized! at org.junit.Assert.fail(Assert.java:88) at org.apache.hive.jdbc.TestJdbcWithMiniHS2.finishTasks(TestJdbcWithMiniHS2.java:374) at org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation3(TestJdbcWithMiniHS2.java:304) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19897) Add more tests for parallel compilation
Yongzhi Chen created HIVE-19897: --- Summary: Add more tests for parallel compilation Key: HIVE-19897 URL: https://issues.apache.org/jira/browse/HIVE-19897 Project: Hive Issue Type: Test Components: HiveServer2 Reporter: Yongzhi Chen Assignee: Yongzhi Chen The two parallel compilation tests in org.apache.hive.jdbc.TestJdbcWithMiniHS2 do not real cover the case of queries compile concurrently from different connections. No sure it is on purpose or by mistake. Add more tests to cover the case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [VOTE] Stricter commit guidelines
+1 On Tue, May 15, 2018 at 9:59 PM, Siddharth Sethwrote: > +1 > > On Mon, May 14, 2018 at 10:44 PM, Jesus Camacho Rodriguez < > jcama...@apache.org> wrote: > > > After work has been done to ignore most of the tests that were failing > > consistently/intermittently [1], I wanted to start this vote to gather > > support from the community to be stricter wrt committing patches to Hive. > > The committers guide [2] already specifies that a +1 should be obtained > > before committing, but there is another clause that allows committing > under > > the presence of flaky tests (clause 4). Flaky tests are as good as having > > no tests, hence I propose to remove clause 4 and enforce the +1 from > > testing infra before committing. > > > > > > > > As I see it, by enforcing that we always get a +1 from the testing infra > > before committing, 1) we will have a more stable project, and 2) we will > > have another incentive as a community to create a more robust testing > > infra, e.g., replacing flaky tests for similar unit tests that are not > > flaky, trying to decrease running time for tests, etc. > > > > > > > > Please, share your thoughts about this. > > > > > > > > Here is my +1. > > > > > > > > Thanks, > > > > Jesús > > > > > > > > [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805. > > mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E > > > > [2] https://cwiki.apache.org/confluence/display/Hive/ > > HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches > > > > > > > > >
[jira] [Created] (HIVE-19296) Add log to record MapredLocalTask Failure
Yongzhi Chen created HIVE-19296: --- Summary: Add log to record MapredLocalTask Failure Key: HIVE-19296 URL: https://issues.apache.org/jira/browse/HIVE-19296 Project: Hive Issue Type: Bug Components: Diagnosability Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen In some cases, When MapredLocalTask fails around Child process start time, we can not find the detail error information anywhere(not in strerr log, no MapredLocal log file). All we get is : {noformat} *** ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1 *** ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-]: Obtaining error information *** ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-]: Task failed! Task ID: Stage-48 Logs: *** ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-]: /var/log/hive/hadoop-cmf-hive1-HIVESERVER2-t.log.out *** ERROR org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask: [HiveServer2-Background-Pool: Thread-]: Execution failed with exit status: 1 {noformat} It is really hard to debug. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66188/#review201323 --- standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java Lines 7730 (patched) <https://reviews.apache.org/r/66188/#comment282498> Should you call addQueryAfterUse and closeAllQueries ? That's how do you release the resources held by the batch queries? - Yongzhi Chen On March 21, 2018, 6:57 p.m., Aihua Xu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66188/ > --- > > (Updated March 21, 2018, 6:57 p.m.) > > > Review request for hive, Alexander Kolbasov and Yongzhi Chen. > > > Repository: hive-git > > > Description > --- > > If the table contains a lot of columns e.g, 5k, simple table rename would > fail with the following stack trace. The issue is datanucleus can't handle > the query with lots of colName='c1' && colName='c2' && ... . > > I'm breaking the query into multiple smaller queries and then we aggregate > the result together. > > > Diffs > - > > ql/src/test/queries/clientpositive/alter_rename_table.q 2061850540 > ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java > PRE-CREATION > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > 6ead20aeaf > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 88d88ed4df > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java > 9f822564bd > > > Diff: https://reviews.apache.org/r/66188/diff/2/ > > > Testing > --- > > Manual test has been done for large column of tables. > > > Thanks, > > Aihua Xu > >
[jira] [Created] (HIVE-18671) lock not released after Hive on Spark query was cancelled
Yongzhi Chen created HIVE-18671: --- Summary: lock not released after Hive on Spark query was cancelled Key: HIVE-18671 URL: https://issues.apache.org/jira/browse/HIVE-18671 Project: Hive Issue Type: Bug Affects Versions: 2.3.2 Reporter: Yongzhi Chen Assignee: Yongzhi Chen When cancel the query is running on spark, the SparkJobMonitor can not return, therefore the locks hold by the query can not be released. When enable debug in log, you will see many log info as following: {noformat} 2018-02-09 08:27:09,613 INFO org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor: [HiveServer2-Background-Pool: Thread-80]: state = CANCELLED 2018-02-09 08:27:10,613 INFO org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor: [HiveServer2-Background-Pool: Thread-80]: state = CANCELLED {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [VOTE] Apache Hive 2.3.2 Release Candidate 0
+1 I verified the release by . Checked the gpg signature . Checked the md5 files. And install the hive 2.3.2 and test commands: show tables; create table; select * from table; The release works fine. On Mon, Nov 13, 2017 at 11:43 AM, Sergio Penawrote: > +1 > > I verified the release by doing the following: > * checked the gpg signature > * checked the md5 files > * installed hive 2.3.2 in my local machine with hadoop 2.7.2 and run a few > commands: > > show databases > > show tables > > insert into table values() > > select * from table > > select count(*) from table > * checked the maven artifacts are correctly pulled by other components and > run unit tests > * checked that storage-api-2.4.0 is pulled > * checked the release tag > * checked the RELEASE_NOTES, NOTICE, LICENSE are correct > > The release is working correctly. > > Thanks Sahil for making this release. > - Sergio > > On Thu, Nov 9, 2017 at 5:37 PM, Sahil Takiar > wrote: > > > Apache Hive 2.3.2 Release Candidate 0 is available here: > > http://people.apache.org/~stakiar/hive-2.3.2/ > > > > Maven artifacts are available here: > > https://repository.apache.org/content/repositories/orgapachehive-1082/ > > > > Source tag for RCN is at:https://github.com/apache/ > hive/tree/release-2.3.2 > > > > Voting will conclude in 72 hours. > > > > Hive PMC Members: Please test and vote. > > > > Thanks. > > >
[jira] [Created] (HIVE-17640) Comparison of date return null if only time part is provided in string.
Yongzhi Chen created HIVE-17640: --- Summary: Comparison of date return null if only time part is provided in string. Key: HIVE-17640 URL: https://issues.apache.org/jira/browse/HIVE-17640 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Fix For: 2.1.0 Reproduce: select '2017-01-01 00:00:00' < current_date; INFO : OK ... 1 row selected (18.324 seconds) ... NULL -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16875) Query against view with partitioned child on HoS fails with privilege exception.
Yongzhi Chen created HIVE-16875: --- Summary: Query against view with partitioned child on HoS fails with privilege exception. Key: HIVE-16875 URL: https://issues.apache.org/jira/browse/HIVE-16875 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Query against view with child table that has partitions fails with privilege exception even with correct privileges. Reproduce: {noformat} create table jsamp1 (a string) partitioned by (b int); insert into table jsamp1 partition (b=1) values ("hello"); create view jview as select * from jsamp1; create role viewtester; grant all on table jview to role viewtester; grant role viewtester to group testers; Use MR, the select will succeed: set hive.execution.engine=mr; select count(*) from jview; while use spark: set hive.execution.engine=spark; select count(*) from jview; it fails with: Error: Error while compiling statement: FAILED: SemanticException No valid privileges User tester does not have privileges for QUERY The required privileges: Server=server1->Db=default->Table=j1part->action=select; (state=42000,code=4) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Welcome Rui Li to Hive PMC
Congrats Rui! On Thu, May 25, 2017 at 1:48 PM, Vineet Gargwrote: > Congrats Rui! > > > On May 24, 2017, at 9:19 PM, Xuefu Zhang wrote: > > > > Hi all, > > > > It's an honer to announce that Apache Hive PMC has recently voted to > invite > > Rui Li as a new Hive PMC member. Rui is a long time Hive contributor and > > committer, and has made significant contribution in Hive especially in > Hive > > on Spark. Please join me in congratulating him and looking forward to a > > bigger role that he will play in Apache Hive project. > > > > Thanks, > > Xuefu > >
[jira] [Created] (HIVE-16660) Not able to add partition for views in hive when sentry is enabled
Yongzhi Chen created HIVE-16660: --- Summary: Not able to add partition for views in hive when sentry is enabled Key: HIVE-16660 URL: https://issues.apache.org/jira/browse/HIVE-16660 Project: Hive Issue Type: Bug Components: Parser Reporter: Yongzhi Chen Assignee: Yongzhi Chen Repro: create table tesnit (a int) partitioned by (p int); insert into table tesnit partition (p = 1) values (1); insert into table tesnit partition (p = 2) values (1); create view test_view partitioned on (p) as select * from tesnit where p =1; alter view test_view add partition (p = 2); Error: Error while compiling statement: FAILED: SemanticException [Error 10056]: The query does not reference any valid partition. To run this query, set hive.mapred.mode=nonstrict (state=42000,code=10056) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Review Request 58992: HIVE-16572: Rename a partition should not drop its column stats
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58992/#review174169 --- Ship it! Ship It! - Yongzhi Chen On May 4, 2017, 2:19 p.m., Chaoyu Tang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58992/ > --- > > (Updated May 4, 2017, 2:19 p.m.) > > > Review request for hive. > > > Bugs: HIVE-16572 > https://issues.apache.org/jira/browse/HIVE-16572 > > > Repository: hive-git > > > Description > --- > > This patch is to fix the issue in renaming a partition. > > > Diffs > - > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java > d8af7a7 > ql/src/test/queries/clientpositive/alter_table_column_stats.q 39dfb0c > ql/src/test/queries/clientpositive/rename_external_partition_location.q > be93bd4 > ql/src/test/results/clientpositive/alter_table_column_stats.q.out 8739bfe > ql/src/test/results/clientpositive/rename_external_partition_location.q.out > 1670b4e > > > Diff: https://reviews.apache.org/r/58992/diff/1/ > > > Testing > --- > > Manual tests > new qtests > > > Thanks, > > Chaoyu Tang > >
Re: Review Request 58992: HIVE-16572: Rename a partition should not drop its column stats
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58992/#review174165 --- metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java Line 555 (original), 568 (patched) <https://reviews.apache.org/r/58992/#comment247282> Can the transaction be properly rolled back with the incomplete state? - Yongzhi Chen On May 4, 2017, 2:19 p.m., Chaoyu Tang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58992/ > --- > > (Updated May 4, 2017, 2:19 p.m.) > > > Review request for hive. > > > Bugs: HIVE-16572 > https://issues.apache.org/jira/browse/HIVE-16572 > > > Repository: hive-git > > > Description > --- > > This patch is to fix the issue in renaming a partition. > > > Diffs > - > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java > d8af7a7 > ql/src/test/queries/clientpositive/alter_table_column_stats.q 39dfb0c > ql/src/test/queries/clientpositive/rename_external_partition_location.q > be93bd4 > ql/src/test/results/clientpositive/alter_table_column_stats.q.out 8739bfe > ql/src/test/results/clientpositive/rename_external_partition_location.q.out > 1670b4e > > > Diff: https://reviews.apache.org/r/58992/diff/1/ > > > Testing > --- > > Manual tests > new qtests > > > Thanks, > > Chaoyu Tang > >
Re: Review Request 58456: Query cancel: improve the way to handle files
> On April 19, 2017, 5:50 p.m., Chaoyu Tang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java > > Lines 46 (patched) > > <https://reviews.apache.org/r/58456/diff/1/?file=1692688#file1692688line46> > > > > To be honest, I am not very comfortable to import the Driver here. I > > thought the CombineHiveInputFormat in io package is at a lower architecture > > layer than Driver ql. > > Is there any other way which we can detect if the thread has been > > interrupted (e.g. Thread.getCurrentThread().isInterrupted() etc? > > Also as I recall (if I am right), there might be a class which handles > > this interrupt signal globally, I could not find it at this moment. The check in CombineHiveInputFormat is just to check the threadlocal object, it is following the same pattern to check hive's own cancel related status. I think we are trying to avoid use the In the CombineHiveInputFormat, it can include: import org.apache.hadoop.hive.ql.exec.Operator; import org.apache.hadoop.hive.ql.exec.Utilities; I do not think it is a problem to import org.apache.hadoop.hive.ql.Driver > On April 19, 2017, 5:50 p.m., Chaoyu Tang wrote: > > service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java > > Lines 399 (patched) > > <https://reviews.apache.org/r/58456/diff/1/?file=1692689#file1692689line399> > > > > As I understand, basically the cleanup is called with the parameter > > state value CANCELED, TIMEOUT and CLOSED, and here you are trying to > > address the race issue in the normal CLOSE case where the thread should not > > be interrupted and further clean the tmp file. Is it right? > > Another thought, could moving the code > > {code} > > ss.deleteTmpOutputFile(); > > ss.deleteTmpErrOutputFile(); > > {code} > > from sqlOperation to driver close() or destroy() will be help to solve > > the problem? The exception error "Failed to clean-up tmp directories." is from Utilities.clearWork(job); from execute, the clearWork cleans the folders used for map and reduce plan path. ss.deleteTmpOutputFile(); ss.deleteTmpErrOutputFile(); is to clean the output data tmp folder, so they are different. /** * Temporary file name used to store results of non-Hive commands (e.g., set, dfs) * and HiveServer.fetch*() function will read results from this file */ protected File tmpOutputFile; /** * Temporary file name used to store error output of executing non-Hive commands (e.g., set, dfs) */ protected File tmpErrOutputFile; - Yongzhi --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58456/#review172374 --- On April 14, 2017, 1:14 p.m., Yongzhi Chen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58456/ > --- > > (Updated April 14, 2017, 1:14 p.m.) > > > Review request for hive, Aihua Xu, Chaoyu Tang, and Sergio Pena. > > > Bugs: HIVE-16426 > https://issues.apache.org/jira/browse/HIVE-16426 > > > Repository: hive-git > > > Description > --- > > 1. Use threadlocal variable to store cancel state to make it is accessible > without being passed around by parameters. > 2. Add checkpoints for file operations. > 3. Remove backgroundHandle.cancel to avoid failed file cleanup because of the > interruption. By what I observed that the method seems not very effective for > scheduled operation, for example, the on going HMS API calls. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java > a80004662068eb2391c0dd7062f77156b222375b > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > b0657f01d4482dc8bb8dc180e5e7deffbdb533e6 > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java > 7a113bf8e5c4dd8c2c486741a5ebc7b8940e746b > service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java > 04fc0a17c93120b8f6e6d7c36e4d70631d56baca > > > Diff: https://reviews.apache.org/r/58456/diff/1/ > > > Testing > --- > > Manually tested. > > > Thanks, > > Yongzhi Chen > >
Re: Review Request 58456: Query cancel: improve the way to handle files
> On April 14, 2017, 5:43 p.m., Aihua Xu wrote: > > service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java > > Lines 399 (patched) > > <https://reviews.apache.org/r/58456/diff/1/?file=1692689#file1692689line399> > > > > I'm not exactly following what we are doing here. Not sure how > > background thread gets closed later. > > > > Otherwise, the other changes look good. The background thread will complete the task or gracefully closed with the guidance of the cancel status. Our current cancel design majorly follows the pattern that cancel command set the cancel status, the working thread(background thread) check the cancel status and decide to quit or continue. The backgroundHandle.cancel(true) does not follow the pattern and cause some conflicts. The following warning log is caused by this: 2017-04-11 09:57:30,727 WARN org.apache.hadoop.hive.ql.exec.Utilities: [HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories. java.io.InterruptedIOException: Call interrupted - Yongzhi --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58456/#review172009 ------- On April 14, 2017, 1:14 p.m., Yongzhi Chen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58456/ > --- > > (Updated April 14, 2017, 1:14 p.m.) > > > Review request for hive, Aihua Xu, Chaoyu Tang, and Sergio Pena. > > > Bugs: HIVE-16426 > https://issues.apache.org/jira/browse/HIVE-16426 > > > Repository: hive-git > > > Description > --- > > 1. Use threadlocal variable to store cancel state to make it is accessible > without being passed around by parameters. > 2. Add checkpoints for file operations. > 3. Remove backgroundHandle.cancel to avoid failed file cleanup because of the > interruption. By what I observed that the method seems not very effective for > scheduled operation, for example, the on going HMS API calls. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java > a80004662068eb2391c0dd7062f77156b222375b > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > b0657f01d4482dc8bb8dc180e5e7deffbdb533e6 > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java > 7a113bf8e5c4dd8c2c486741a5ebc7b8940e746b > service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java > 04fc0a17c93120b8f6e6d7c36e4d70631d56baca > > > Diff: https://reviews.apache.org/r/58456/diff/1/ > > > Testing > --- > > Manually tested. > > > Thanks, > > Yongzhi Chen > >
Review Request 58456: Query cancel: improve the way to handle files
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58456/ --- Review request for hive, Aihua Xu, Chaoyu Tang, and Sergio Pena. Bugs: HIVE-16426 https://issues.apache.org/jira/browse/HIVE-16426 Repository: hive-git Description --- 1. Use threadlocal variable to store cancel state to make it is accessible without being passed around by parameters. 2. Add checkpoints for file operations. 3. Remove backgroundHandle.cancel to avoid failed file cleanup because of the interruption. By what I observed that the method seems not very effective for scheduled operation, for example, the on going HMS API calls. Diffs - ql/src/java/org/apache/hadoop/hive/ql/Driver.java a80004662068eb2391c0dd7062f77156b222375b ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java b0657f01d4482dc8bb8dc180e5e7deffbdb533e6 ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 7a113bf8e5c4dd8c2c486741a5ebc7b8940e746b service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 04fc0a17c93120b8f6e6d7c36e4d70631d56baca Diff: https://reviews.apache.org/r/58456/diff/1/ Testing --- Manually tested. Thanks, Yongzhi Chen
[jira] [Created] (HIVE-16426) Query cancel: improve the way to handle files
Yongzhi Chen created HIVE-16426: --- Summary: Query cancel: improve the way to handle files Key: HIVE-16426 URL: https://issues.apache.org/jira/browse/HIVE-16426 Project: Hive Issue Type: Improvement Reporter: Yongzhi Chen Assignee: Yongzhi Chen 1. Add data structure support to make it is easy to check query cancel status. 2. Handle query cancel more gracefully. Remove possible file leaks caused by query cancel as shown in following stack: {noformat} 2017-04-11 09:57:30,727 WARN org.apache.hadoop.hive.ql.exec.Utilities: [HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories. java.io.InterruptedIOException: Call interrupted at org.apache.hadoop.ipc.Client.call(Client.java:1496) at org.apache.hadoop.ipc.Client.call(Client.java:1439) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy20.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy21.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671) at org.apache.hadoop.hive.ql.exec.Utilities.clearWork(Utilities.java:277) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:463) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1978) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1691) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1423) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1202) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88) at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:303) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:316) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} 3. Add checkpoints to related file operations to improve response time for query cancelling. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Review Request 58203: HIVE-16345 BeeLineDriver should be able to run qtest files which are using default database tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58203/#review171166 --- itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java Lines 130 (patched) <https://reviews.apache.org/r/58203/#comment244045> How do you handle the case command has comment following ';' and new command start after ; ? Do these cases matters? For example: show tables; --comment show tables; select * from src; The beeline.Commands class has code similar to getCommands: handleMultiLineCmd, logic in execute Could you figure out a way to use the some of the code there? itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java Lines 160 (patched) <https://reviews.apache.org/r/58203/#comment244048> Is that possible the table belong to other database? For example: use foo; select * from tableinfoo; itests/util/src/main/java/org/apache/hive/beeline/qfile/QFileBeeLineClient.java Line 92 (original), 90 (patched) <https://reviews.apache.org/r/58203/#comment244047> Why we need to replace the tablename with default.tablename? Could you just add use default ? - Yongzhi Chen On April 5, 2017, 10:35 a.m., Peter Vary wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58203/ > --- > > (Updated April 5, 2017, 10:35 a.m.) > > > Review request for hive, Aihua Xu, Zoltan Haindrich, Yongzhi Chen, and Barna > Zsombor Klara. > > > Bugs: HIVE-16345 > https://issues.apache.org/jira/browse/HIVE-16345 > > > Repository: hive-git > > > Description > --- > > The goal of the change is to run qtest files which contain queries on tables > created by the init scripts. > It adds the possibility to rewrite the src table references to default.src > > This patch contains the following changes: > - Added new parameter to the driver, to control weather the rewrite the table > names or not (test.rewrite.source.tables) - default is true > - Made QTestUtil.getSrcTables() available for QFile class > - Run the QFile not with "!run testfile.q", but reading the file, and > assembling the commands - enable us to parse the queries, and provide better > feedback about the failing queries > - QFile rewrites the source tables, if it is required > - Used 9 qtest files from the CliDriver, and added them to BeeLine tests > - Added new filters, and removed redundant ones - I was able to remove every > QFile specific filter, and corresponding setter methods as well > - Moved QFile classes to org.apache.hive.beeline package, so it can use > package private methods from BeeLine, and Commands > - Refactored needsContinuation method in BeeLine, so it can be called from a > static context as well > > And one important change is: > - In Utilities.setMapRedWork, change the INPUT_NAME value in the conf to a > mapreduce task specific value. This one is used by the IOContextMap to cache > the IOContext objects. Using the same value for every mapred task prevented > them to run in the same JVM. The test were running sequencially, but failed > randomly in parallel > > > Diffs > - > > beeline/src/java/org/apache/hive/beeline/BeeLine.java 11526a7 > itests/src/test/resources/testconfiguration.properties 7a70c9c > > itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreBeeLineDriver.java > 0d63f5d > itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 2abf252 > itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java ae5a349 > > itests/util/src/main/java/org/apache/hive/beeline/qfile/QFileBeeLineClient.java > 760fde6 > itests/util/src/main/java/org/apache/hive/beeline/qfile/package-info.java > fcd50ec > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 79955e9 > ql/src/test/results/clientpositive/beeline/drop_with_concurrency.q.out > 385f9b7 > ql/src/test/results/clientpositive/beeline/escape_comments.q.out abc0fee > ql/src/test/results/clientpositive/beeline/smb_mapjoin_1.q.out PRE-CREATION > ql/src/test/results/clientpositive/beeline/smb_mapjoin_10.q.out > PRE-CREATION > ql/src/test/results/clientpositive/beeline/smb_mapjoin_11.q.out > PRE-CREATION > ql/src/test/results/clientpositive/beeline/smb_mapjoin_12.q.out > PRE-CREATION > ql/src/test/results/clientpositive/beeline/smb_mapjoin_13.q.out > PRE-CREATION > ql/src/test/results/clientpositive/beeline/smb_mapjoin_16.q.out > PRE-C
[jira] [Created] (HIVE-15997) Resource leaks when query is cancelled
Yongzhi Chen created HIVE-15997: --- Summary: Resource leaks when query is cancelled Key: HIVE-15997 URL: https://issues.apache.org/jira/browse/HIVE-15997 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen There may some resource leaks when query is cancelled. We see following stacks in the log: Possible files and folder leak: {noformat} 2017-02-02 06:23:25,410 WARN hive.ql.Context: [HiveServer2-Background-Pool: Thread-61]: Error Removing Scratch: java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "ychencdh511t-1.vpc.cloudera.com/172.26.11.50"; destination host is: "ychencdh511t-1.vpc.cloudera.com":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1409) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy25.delete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy26.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675) at org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671) at org.apache.hadoop.hive.ql.Context.removeScratchDir(Context.java:405) at org.apache.hadoop.hive.ql.Context.clear(Context.java:541) at org.apache.hadoop.hive.ql.Driver.releaseContext(Driver.java:2109) at org.apache.hadoop.hive.ql.Driver.closeInProcess(Driver.java:2150) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1472) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88) at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:714) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525) at org.apache.hadoop.ipc.Client.call(Client.java:1448) ... 35 more 2017-02-02 12:26:52,706 INFO org.apache.hive.service.cli.operation.OperationManager: [HiveServer2-Background-Pool: Thread-23]: Operation is timed out,operation=OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdenti
[jira] [Created] (HIVE-15735) In some cases, view objects inside a view do not have parents
Yongzhi Chen created HIVE-15735: --- Summary: In some cases, view objects inside a view do not have parents Key: HIVE-15735 URL: https://issues.apache.org/jira/browse/HIVE-15735 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen This cause Sentry throws "No valid privileges" error: Error: Error while compiling statement: FAILED: SemanticException No valid privileges. To reproduce: Enable sentry: create table t1( i int); create view v1 as select * from t1; create view v2 as select * from v1 union all select * from v1; If the user does not have read permission on t1 and v1, the query select * from v2; This will fail with: Error: Error while compiling statement: FAILED: SemanticException No valid privileges User foo does not have privileges for QUERY The required privileges: Server=server1->Db=database2->Table=v1->action=select; (state=42000,code=4) Sentry should not check v1's permission, for v1 has at least one parent(v2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 55623: HIVE-15617: Improve the avg performance for Range based window
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55623/#review162147 --- Ship it! Ship It! - Yongzhi Chen On Jan. 17, 2017, 3:02 p.m., Aihua Xu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55623/ > --- > > (Updated Jan. 17, 2017, 3:02 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-15617: Improve the avg performance for Range based window > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java > 5ad5c0628f19dabf17191c08e0b14f8e2b1391e8 > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java > f5f9f7bb8980636fa364001c5508c215b304b9eb > > Diff: https://reviews.apache.org/r/55623/diff/ > > > Testing > --- > > > Thanks, > > Aihua Xu > >
Re: Review Request 55623: HIVE-15617: Improve the avg performance for Range based window
> On Jan. 17, 2017, 3:46 p.m., Yongzhi Chen wrote: > > Could you add a test case which range size is 0 for avg ? - Yongzhi --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55623/#review161875 --- On Jan. 17, 2017, 3:02 p.m., Aihua Xu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55623/ > --- > > (Updated Jan. 17, 2017, 3:02 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-15617: Improve the avg performance for Range based window > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java > 5ad5c0628f19dabf17191c08e0b14f8e2b1391e8 > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java > f5f9f7bb8980636fa364001c5508c215b304b9eb > > Diff: https://reviews.apache.org/r/55623/diff/ > > > Testing > --- > > > Thanks, > > Aihua Xu > >
Re: Review Request 55623: HIVE-15617: Improve the avg performance for Range based window
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55623/#review161875 --- ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java (line 132) <https://reviews.apache.org/r/55623/#comment233154> Is that possible sum is not null, numRows == 0 ? - Yongzhi Chen On Jan. 17, 2017, 3:02 p.m., Aihua Xu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55623/ > --- > > (Updated Jan. 17, 2017, 3:02 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-15617: Improve the avg performance for Range based window > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java > 5ad5c0628f19dabf17191c08e0b14f8e2b1391e8 > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/BasePartitionEvaluator.java > f5f9f7bb8980636fa364001c5508c215b304b9eb > > Diff: https://reviews.apache.org/r/55623/diff/ > > > Testing > --- > > > Thanks, > > Aihua Xu > >
Re: Review Request 55479: Improve canceling response time for acquiring locks
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55479/ --- (Updated Jan. 13, 2017, 5:14 p.m.) Review request for hive, Aihua Xu and Chaoyu Tang. Changes --- New Patch fixed issues found by review Bugs: HIVE-15572 https://issues.apache.org/jira/browse/HIVE-15572 Repository: hive-git Description --- 1. Add data structure to pass driverstate 2. Driver state check when acquire locks by zookeeper. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/Driver.java fd6020b85591ea190aa33ae9f2dc925a38fc7471 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 721974db03f1f29bdb84f41db317e37a6a78ca32 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 45ead16560ce7514a1ab6f4ac2de6771582a8a73 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 24fbd9af5fb7be6b238c6ed246e360477d3c47de ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 20e114776f143715d5820e6a1acb794a9d6de02c ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockManager.java b2eb99775c220e9ce347fa1cb918ebf4e738eac2 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java ce220a21de01a188da940e4511ee6876d0c15a4a ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManagerImpl.java ed022d9193f14436ed527f9cbd3df45d48857cf4 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 14d0ef4e27e0518c1bafcbdcde12f09e101a3321 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java e189d383b6d090ce151b6ab30fb240c261430239 Diff: https://reviews.apache.org/r/55479/diff/ Testing --- Unit test Manual test Thanks, Yongzhi Chen
Re: Review Request 55479: Improve canceling response time for acquiring locks
> On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote: > > Enum DriverState and lock somestime work with code outside, they can not totally encapsulated. > On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 205 > > <https://reviews.apache.org/r/55479/diff/1/?file=1604041#file1604041line205> > > > > I wonder if it might look cleaner if we have an inner class called > > DriverState similar to this LockedDriverState. But all driver state related > > stuffs such as enum, lock are encapsulated in this class. It provides the > > methods for state transition etc. The lock and state sometimes work with code outside, so it can not fully encapsulated. > On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java, > > line 189 > > <https://reviews.apache.org/r/55479/diff/1/?file=1604049#file1604049line189> > > > > as I commented before, the DriverState class might provide this method > > for inspecting this state, which looks better. This is a simplefied condition check to mimize lock time, the strict check should be: lock() if state is not interrupt do aquirelock from zookeeper unlock() And private boolean isInterrupted() in Driver.java is different from this one. In Driver.java it interrupts current thread, here we do not So if we encapsulate the method, we lose the flexible. > On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java, line 184 > > <https://reviews.apache.org/r/55479/diff/1/?file=1604044#file1604044line184> > > > > Race condition here: > > if hiveLocks == null was caused by the interruption, but when the code > > executes this step, the state was just changed to be interrupted, then the > > exception msg will not be right. I thought about this, in our code we sacrify race condition a little bit to improve performance. The worst case is the error message becomes: Locks on the underlying objects cannot be acquired. Other wise, the lock for driverstate has to be locked the whole acquirelock method. > On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java, line 1134 > > <https://reviews.apache.org/r/55479/diff/1/?file=1604041#file1604041line1134> > > > > nit: need a space between userFromUGI,lDrvState The new patch will fix the issue. > On Jan. 13, 2017, 1:17 a.m., Chaoyu Tang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java, line 454 > > <https://reviews.apache.org/r/55479/diff/1/?file=1604042#file1604042line454> > > > > should be "Query was cancelled while acquiring locks on the underlying > > objects."? The new patch will fix the issue. - Yongzhi ------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55479/#review161468 --- On Jan. 12, 2017, 11:21 p.m., Yongzhi Chen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55479/ > --- > > (Updated Jan. 12, 2017, 11:21 p.m.) > > > Review request for hive, Aihua Xu and Chaoyu Tang. > > > Bugs: HIVE-15572 > https://issues.apache.org/jira/browse/HIVE-15572 > > > Repository: hive-git > > > Description > --- > > 1. Add data structure to pass driverstate > 2. Driver state check when acquire locks by zookeeper. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java > fd6020b85591ea190aa33ae9f2dc925a38fc7471 > ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java > 721974db03f1f29bdb84f41db317e37a6a78ca32 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java > 45ead16560ce7514a1ab6f4ac2de6771582a8a73 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java > 24fbd9af5fb7be6b238c6ed246e360477d3c47de > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java > 20e114776f143715d5820e6a1acb794a9d6de02c > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockManager.java > b2eb99775c220e9ce347fa1cb918ebf4e738eac2 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java > ce220a21de01a188da940e4511ee6876d0c15a4a > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManagerImpl.java > ed022d9193f14436ed527f9cbd3df45d48857cf4 > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java > 14d0ef4e27e0518c1bafcbdcde12f09e101a3321 > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java > e189d383b6d090ce151b6ab30fb240c261430239 > > Diff: https://reviews.apache.org/r/55479/diff/ > > > Testing > --- > > Unit test > Manual test > > > Thanks, > > Yongzhi Chen > >
[jira] [Created] (HIVE-15615) Fix unit tests failures cause by HIVE-13696
Yongzhi Chen created HIVE-15615: --- Summary: Fix unit tests failures cause by HIVE-13696 Key: HIVE-15615 URL: https://issues.apache.org/jira/browse/HIVE-15615 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Following unit tests failed with same stack: org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerShowFilters {noformat} 2017-01-11T15:02:27,774 ERROR [main] ql.Driver: FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.cleanName(QueuePlacementRule.java:351) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$User.getQueueForApp(QueuePlacementRule.java:132) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167) at org.apache.hadoop.hive.schshim.FairSchedulerShim.setJobQueueForUserInternal(FairSchedulerShim.java:96) at org.apache.hadoop.hive.schshim.FairSchedulerShim.validateQueueConfiguration(FairSchedulerShim.java:82) at org.apache.hadoop.hive.ql.session.YarnFairScheduling.validateYarnQueue(YarnFairScheduling.java:68) at org.apache.hadoop.hive.ql.Driver.configureScheduling(Driver.java:671) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:543) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1313) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1233) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1223) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 55479: Improve canceling response time for acquiring locks
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55479/ --- Review request for hive, Aihua Xu and Chaoyu Tang. Bugs: HIVE-15572 https://issues.apache.org/jira/browse/HIVE-15572 Repository: hive-git Description --- 1. Add data structure to pass driverstate 2. Driver state check when acquire locks by zookeeper. Diffs - ql/src/java/org/apache/hadoop/hive/ql/Driver.java fd6020b85591ea190aa33ae9f2dc925a38fc7471 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 721974db03f1f29bdb84f41db317e37a6a78ca32 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java 45ead16560ce7514a1ab6f4ac2de6771582a8a73 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 24fbd9af5fb7be6b238c6ed246e360477d3c47de ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 20e114776f143715d5820e6a1acb794a9d6de02c ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockManager.java b2eb99775c220e9ce347fa1cb918ebf4e738eac2 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java ce220a21de01a188da940e4511ee6876d0c15a4a ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManagerImpl.java ed022d9193f14436ed527f9cbd3df45d48857cf4 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 14d0ef4e27e0518c1bafcbdcde12f09e101a3321 ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDummyTxnManager.java e189d383b6d090ce151b6ab30fb240c261430239 Diff: https://reviews.apache.org/r/55479/diff/ Testing --- Unit test Manual test Thanks, Yongzhi Chen
[jira] [Created] (HIVE-15572) Improve the response time for query canceling when it happens during acquiring locks
Yongzhi Chen created HIVE-15572: --- Summary: Improve the response time for query canceling when it happens during acquiring locks Key: HIVE-15572 URL: https://issues.apache.org/jira/browse/HIVE-15572 Project: Hive Issue Type: Improvement Reporter: Yongzhi Chen Assignee: Yongzhi Chen When query canceling command sent during Hive Acquire locks (from zookeeper), hive will finish acquiring all the locks and release them. As it is shown in the following log: It took 165 s to finish acquire the lock,then spend 81s to release them. We can improve the performance by not acquiring any more locks and releasing held locks when the query canceling command is received. Background-Pool: Thread-224]: 2017-01-03 10:50:35,413 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-224]: 2017-01-03 10:51:00,671 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-218]: 2017-01-03 10:51:00,672 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-218]: 2017-01-03 10:51:00,672 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-218]: FAILED: query select count(*) from manyparttbl has been cancelled 2017-01-03 10:51:00,673 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-218]: 2017-01-03 10:51:40,755 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-215]: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Invitation for Hive committers to become ORC committers
Hi Owen, I am interested. Thanks Yongzhi Chen On Thu, Dec 15, 2016 at 4:12 PM, Owen O'Malley <omal...@apache.org> wrote: > All, >As you are aware, we are in the last stages of removing the forked ORC > code out of Hive. The goal of moving ORC out of Hive was to increase its > community and we want to be very deliberately inclusive of the Hive > development community. Towards that end, the ORC PMC wants to welcome > anyone who is already a Hive committer to become a committer on ORC. > > Please respond on this thread to let us know if you are interested. > > Thanks, >Owen on behalf of the ORC PMC >
[jira] [Created] (HIVE-15437) avro tables join fails when - tbl join tbl_postfix
Yongzhi Chen created HIVE-15437: --- Summary: avro tables join fails when - tbl join tbl_postfix Key: HIVE-15437 URL: https://issues.apache.org/jira/browse/HIVE-15437 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen The following queries return good results: select * from table1 where col1=key1; select * from table1_1 where col1=key1; When join them together, it gets following error: {noformat} Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found long, expecting union at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:43) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:141) ~[hive-shims-common-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] {noformat} The two avro tables both is defined by using avro schema, and the first table's name is the second table name's prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15391) Location validation for table should ignore the values for view.
Yongzhi Chen created HIVE-15391: --- Summary: Location validation for table should ignore the values for view. Key: HIVE-15391 URL: https://issues.apache.org/jira/browse/HIVE-15391 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 2.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When use schematool to do location validation, we got error message for views, for example: {noformat} n DB with Name: viewa NULL Location for TABLE with Name: viewa In DB with Name: viewa NULL Location for TABLE with Name: viewb In DB with Name: viewa {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15359) skip.footer.line.count doesnt work properly for certain situations
Yongzhi Chen created HIVE-15359: --- Summary: skip.footer.line.count doesnt work properly for certain situations Key: HIVE-15359 URL: https://issues.apache.org/jira/browse/HIVE-15359 Project: Hive Issue Type: Bug Components: Reader Reporter: Yongzhi Chen Assignee: Yongzhi Chen This issue's reproduce is very like HIVE-12718 , but the data file is larger than 128M . In this case, even make sure only one mapper is used, the footer is still wrongly skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15320) Cross Realm hive query is failing with KERBEROS authentication error
Yongzhi Chen created HIVE-15320: --- Summary: Cross Realm hive query is failing with KERBEROS authentication error Key: HIVE-15320 URL: https://issues.apache.org/jira/browse/HIVE-15320 Project: Hive Issue Type: Improvement Components: Security Reporter: Yongzhi Chen Executing cross realm query and it is failing. Authentication against remote NN is tried with SIMPLE, not KERBEROS. It looks Hive does not obtain needed ticket for remote NN. insert overwrite directory 'hdfs://differentrealmhost:8020/hive/test' select * from currentrealmtable where ...; It will fail with java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] hdfs command distcp works fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 53966: HIVE-15199: INSERT INTO data on S3 is replacing the old rows with the new ones
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53966/#review156659 --- The latest patch solved all the issues Illya Yalovyy pointed out, the fix looks good. +1 - Yongzhi Chen On Nov. 22, 2016, 10:35 p.m., Sergio Pena wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53966/ > --- > > (Updated Nov. 22, 2016, 10:35 p.m.) > > > Review request for hive. > > > Bugs: HIVE-15199 > https://issues.apache.org/jira/browse/HIVE-15199 > > > Repository: hive-git > > > Description > --- > > The patch helps execute repeated INSERT INTO statements on S3 tables when the > scratch directory is on S3. > > > Diffs > - > > itests/hive-blobstore/src/test/queries/clientpositive/insert_into.q > 919ff7d9c7cb40062d68b876d6acbc8efb8a8cf1 > itests/hive-blobstore/src/test/results/clientpositive/insert_into.q.out > c25d0c4eec6983b6869e2eba711b39ba91a4c6e0 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java > 61b8bd0ac40cffcd6dca0fc874940066bc0aeffe > > Diff: https://reviews.apache.org/r/53966/diff/ > > > Testing > --- > > > Thanks, > > Sergio Pena > >
Re: Review Request 53966: HIVE-15199: INSERT INTO data on S3 is replacing the old rows with the new ones
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53966/#review156644 --- Ship it! Ship It! - Yongzhi Chen On Nov. 21, 2016, 11:54 p.m., Sergio Pena wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53966/ > --- > > (Updated Nov. 21, 2016, 11:54 p.m.) > > > Review request for hive. > > > Bugs: HIVE-15199 > https://issues.apache.org/jira/browse/HIVE-15199 > > > Repository: hive-git > > > Description > --- > > The patch helps execute repeated INSERT INTO statements on S3 tables when the > scratch directory is on S3. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/common/FileUtils.java > 1d8c04160c35e48781b20f8e6e14760c19df9ca5 > itests/hive-blobstore/src/test/queries/clientpositive/insert_into.q > 919ff7d9c7cb40062d68b876d6acbc8efb8a8cf1 > itests/hive-blobstore/src/test/results/clientpositive/insert_into.q.out > c25d0c4eec6983b6869e2eba711b39ba91a4c6e0 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java > 61b8bd0ac40cffcd6dca0fc874940066bc0aeffe > > Diff: https://reviews.apache.org/r/53966/diff/ > > > Testing > --- > > > Thanks, > > Sergio Pena > >
Re: Review Request 53966: HIVE-15199: INSERT INTO data on S3 is replacing the old rows with the new ones
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53966/#review156639 --- ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 2951) <https://reviews.apache.org/r/53966/#comment226831> if (isBlobStoragePath && !destFs.exists(destFilePath) then the second condition : !destFs.rename(sourcePath, destFilePath) will be evaluated. I assume you do not want that be called, right - Yongzhi Chen On Nov. 21, 2016, 11:54 p.m., Sergio Pena wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53966/ > --- > > (Updated Nov. 21, 2016, 11:54 p.m.) > > > Review request for hive. > > > Bugs: HIVE-15199 > https://issues.apache.org/jira/browse/HIVE-15199 > > > Repository: hive-git > > > Description > --- > > The patch helps execute repeated INSERT INTO statements on S3 tables when the > scratch directory is on S3. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/common/FileUtils.java > 1d8c04160c35e48781b20f8e6e14760c19df9ca5 > itests/hive-blobstore/src/test/queries/clientpositive/insert_into.q > 919ff7d9c7cb40062d68b876d6acbc8efb8a8cf1 > itests/hive-blobstore/src/test/results/clientpositive/insert_into.q.out > c25d0c4eec6983b6869e2eba711b39ba91a4c6e0 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java > 61b8bd0ac40cffcd6dca0fc874940066bc0aeffe > > Diff: https://reviews.apache.org/r/53966/diff/ > > > Testing > --- > > > Thanks, > > Sergio Pena > >
[jira] [Created] (HIVE-15074) Schematool provides a way to detect invalid entries in VERSION table
Yongzhi Chen created HIVE-15074: --- Summary: Schematool provides a way to detect invalid entries in VERSION table Key: HIVE-15074 URL: https://issues.apache.org/jira/browse/HIVE-15074 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Yongzhi Chen Priority: Minor For some unknown reason, we see customer's HMS can not start because there are multiple entries in their HMS VERSION table. Schematool should provide a way to validate the HMS db and provide warning and fix options for this kind of issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15073) Schematool should detect malformed URIs
Yongzhi Chen created HIVE-15073: --- Summary: Schematool should detect malformed URIs Key: HIVE-15073 URL: https://issues.apache.org/jira/browse/HIVE-15073 Project: Hive Issue Type: Improvement Reporter: Yongzhi Chen For some causes(most unknown), HMS DB tables sometimes has invalid entries, for example URI missing scheme for SDS table's LOCATION column or DBS's DB_LOCATION_URI column. These malformed URIs lead to hard to analyze errors in HIVE and SENTRY. Schematool need to provide a command to detect these malformed URI, give a warning and provide an option to fix the URIs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15072) Schematool should recognize missing tables in metastore
Yongzhi Chen created HIVE-15072: --- Summary: Schematool should recognize missing tables in metastore Key: HIVE-15072 URL: https://issues.apache.org/jira/browse/HIVE-15072 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Yongzhi Chen When Install a new database failed half way(for some other reasons), not all of the metastore tables are installed. This caused HMS server failed to start up due to missing tables. Re-run the Schematool, It ran successfully, and in the stdout log said: "Database already has tables. Skipping table creation". However, restarting HMS getting the same error reporting missing tables. Schematool should detect missing tables and provide options to go ahead and recreate missing tables in the case of new installation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 52835: HIVE-14926: Keep Schema in consistent state where schemaTool fails or succeeds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52835/#review152674 --- beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java (line 231) <https://reviews.apache.org/r/52835/#comment221767> Is that possible a command has more than one lines? - Yongzhi Chen On Oct. 13, 2016, 8:43 p.m., Aihua Xu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52835/ > --- > > (Updated Oct. 13, 2016, 8:43 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-14926: Keep Schema in consistent state where schemaTool fails or succeeds > > > Diffs > - > > beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java 181f0d2 > beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java cd36ddf > itests/hive-unit/src/test/java/org/apache/hive/beeline/TestSchemaTool.java > 0d5f9c8 > > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java > 9c30ee7 > > Diff: https://reviews.apache.org/r/52835/diff/ > > > Testing > --- > > > Thanks, > > Aihua Xu > >
Re: Review Request 52559: HIVE-14799: Query operation are not thread safe during its cancellation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52559/#review152659 --- The 8th version looks good to me. +1 - Yongzhi Chen On Oct. 13, 2016, 11:38 p.m., Chaoyu Tang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52559/ > --- > > (Updated Oct. 13, 2016, 11:38 p.m.) > > > Review request for hive, Sergey Shelukhin, Thejas Nair, Vaibhav Gumashta, and > Yongzhi Chen. > > > Bugs: HIVE-14799 > https://issues.apache.org/jira/browse/HIVE-14799 > > > Repository: hive-git > > > Description > --- > > This patch is going to fix a couple of Driver issues related to the close > request from a thread other than the one running the query (e.g. from > SQLOperation cancel via Timeout or Ctrl-C): > 1. Driver is not thread safe and usually supports only one thread at time > since it has variables like ctx, plan which are not thread protected. But > certain special use cases need access the Driver objects from multiply > threads. For example, when a query runs in a background thread, driver.close > is invoked in another thread by the query timeout (see HIVE-4924). The close > process could nullify the shared variables like ctx which could cause NPE in > the other query thread which is using them. This runtime exception is > unpredictable and not well handled in the code. Some resources (e.g. locks, > files) are left behind and not be cleaned because there are no more available > = references to them. In this patch, I use the waiting in the close which > makes sure only one thread uses these variables and the resource cleaning > happens after the query finished (or interrupted). > 2. SQLOperation.cancel sends the interrupt signal to the background thread > running the query (via backgroundHandle.cancel(true)) but it could not stop > that process since there is no code to capture the signal in the process. In > another word, current timeout code could not gracefully and promptly stop the > query process, though it could eventually stop the process by killing the > running tasks (e.g. MapRedTask) via driverContext.shutdown (see HIVE-5901). > So in the patch, I added a couple of checkpoints to intercept the interrupt > signal either set by close method (a volatile variable) or thread.interrupt. > They should be helpful to capture these signals earlier , though not > intermediately. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java dd55434 > > Diff: https://reviews.apache.org/r/52559/diff/ > > > Testing > --- > > Manually tests > Precommit tests > > > Thanks, > > Chaoyu Tang > >
Re: Review Request 52559: HIVE-14799: Query operation are not thread safe during its cancellation
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52559/#review152556 --- It is not thread safe for releaseDriverContext can be called in compling mode from cancel, but seems only null value matters. So use the local variable(driverCxt) to avoid NPE after the close() is called from cancel? - Yongzhi Chen On Oct. 12, 2016, 4:31 a.m., Chaoyu Tang wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52559/ > --- > > (Updated Oct. 12, 2016, 4:31 a.m.) > > > Review request for hive, Sergey Shelukhin, Thejas Nair, Vaibhav Gumashta, and > Yongzhi Chen. > > > Bugs: HIVE-14799 > https://issues.apache.org/jira/browse/HIVE-14799 > > > Repository: hive-git > > > Description > --- > > This patch is going to fix a couple of Driver issues related to the close > request from a thread other than the one running the query (e.g. from > SQLOperation cancel via Timeout or Ctrl-C): > 1. Driver is not thread safe and usually supports only one thread at time > since it has variables like ctx, plan which are not thread protected. But > certain special use cases need access the Driver objects from multiply > threads. For example, when a query runs in a background thread, driver.close > is invoked in another thread by the query timeout (see HIVE-4924). The close > process could nullify the shared variables like ctx which could cause NPE in > the other query thread which is using them. This runtime exception is > unpredictable and not well handled in the code. Some resources (e.g. locks, > files) are left behind and not be cleaned because there are no more available > = references to them. In this patch, I use the waiting in the close which > makes sure only one thread uses these variables and the resource cleaning > happens after the query finished (or interrupted). > 2. SQLOperation.cancel sends the interrupt signal to the background thread > running the query (via backgroundHandle.cancel(true)) but it could not stop > that process since there is no code to capture the signal in the process. In > another word, current timeout code could not gracefully and promptly stop the > query process, though it could eventually stop the process by killing the > running tasks (e.g. MapRedTask) via driverContext.shutdown (see HIVE-5901). > So in the patch, I added a couple of checkpoints to intercept the interrupt > signal either set by close method (a volatile variable) or thread.interrupt. > They should be helpful to capture these signals earlier , though not > intermediately. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java dd55434 > > Diff: https://reviews.apache.org/r/52559/diff/ > > > Testing > --- > > Manually tests > Precommit tests > > > Thanks, > > Chaoyu Tang > >
Re: Review Request 50525: HIVE-14341: Altered skewed location is not respected for list bucketing
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50525/#review149640 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java (line 234) <https://reviews.apache.org/r/50525/#comment217314> Any reason you change the logic from replace(overwrite) to something like(insert into)? - Yongzhi Chen On Sept. 19, 2016, 9:02 p.m., Aihua Xu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50525/ > --- > > (Updated Sept. 19, 2016, 9:02 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-14341: Altered skewed location is not respected for list bucketing > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java e386717 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java da46854 > > ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java > ba4f6a7 > ql/src/test/queries/clientpositive/create_alter_list_bucketing_table1.q > bf89e8f > ql/src/test/results/clientpositive/create_alter_list_bucketing_table1.q.out > 216d3be > > Diff: https://reviews.apache.org/r/50525/diff/ > > > Testing > --- > > > Thanks, > > Aihua Xu > >
Re: Review Request 50525: HIVE-14341: Altered skewed location is not respected for list bucketing
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50525/#review149635 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java (line 899) <https://reviews.apache.org/r/50525/#comment217309> You change old logic here a little bit in following case: When locationMap has skewedValsCandidate, but allSkewedVals.contains(skewedValsCandidate) == false Before your change, it uses defaultKey in locationMap while after the change, skewedValsCandidate is used. Is that safe? - Yongzhi Chen On Sept. 19, 2016, 9:02 p.m., Aihua Xu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50525/ > --- > > (Updated Sept. 19, 2016, 9:02 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-14341: Altered skewed location is not respected for list bucketing > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java e386717 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java da46854 > > ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java > ba4f6a7 > ql/src/test/queries/clientpositive/create_alter_list_bucketing_table1.q > bf89e8f > ql/src/test/results/clientpositive/create_alter_list_bucketing_table1.q.out > 216d3be > > Diff: https://reviews.apache.org/r/50525/diff/ > > > Testing > --- > > > Thanks, > > Aihua Xu > >
[jira] [Created] (HIVE-14743) ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs
Yongzhi Chen created HIVE-14743: --- Summary: ArrayIndexOutOfBoundsException - HBASE-backed views' query with JOINs Key: HIVE-14743 URL: https://issues.apache.org/jira/browse/HIVE-14743 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 1.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen The stack: {noformat} 2016-09-13T09:38:49,972 ERROR [186b4545-65b5-4bfc-bc8e-3e14e251bb12 main] exec.Task: Job Submission failed with exception 'java.lang.ArrayIndexOutOfBoundsException(1)' java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.createFilterScan(HiveHBaseTableInputFormat.java:224) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplitsInternal(HiveHBaseTableInputFormat.java:492) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:449) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:466) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:356) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:546) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) {noformat} Repro: {noformat} CREATE TABLE HBASE_TABLE_TEST_1( cvalue string , pk string, ccount int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'='cf:val,:key,cf2:count', 'hbase.scan.cache'='500', 'hbase.scan.cacheblocks'='false', 'serialization.format'='1') TBLPROPERTIES ( 'hbase.table.name'='hbase_table_test_1', 'serialization.null.format'='' ); CREATE VIEW VIEW_HBASE_TABLE_TEST_1 AS SELECT hbase_table_test_1.cvalue,hbase_table_test_1.pk,hbase_table_test_1.ccount FROM hbase_table_test_1 WHERE hbase_table_test_1.ccount IS NOT NULL; CREATE TABLE HBASE_TABLE_TEST_2( cvalue string , pk string , ccount int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'='cf:val,:key,cf2:count', 'hbase.scan.cache'='500', 'hbase.scan.cacheblocks'='false', 'serialization.format'='1') TBLPROPERTIES ( 'hbase.table.name'='hbase_table_test_2', 'serialization.null.format'=''); CREATE VIEW VIEW_HBASE_TABLE_TEST_2 AS SELECT hbase_table_test_2.cvalue,hbase_table_test_2.pk,hbase_table_test_2.ccount FROM hbase_table_test_2 WHERE hbase_table_test_2.pk >='3-h-0' AND hbase_table_test_2.pk <= '3-h-g' AND hbase_table_test_2.ccount IS NOT NULL; set hive.auto.convert.join=false; SELECT p.cvalue cvalue FROM `VIEW_HBASE_TABLE_TEST_1` `p` LEFT OUTER JOIN `VIEW_HBASE_TABLE_TEST_2` `A1` ON `p`.cvalue = `A1`.cvalue LEFT OUTER JOIN `VIEW_HBASE_TABLE_TEST_1` `A2` ON `p`.cvalue = `A2`.cvalue; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14715) Hive throws NumberFormatException with query with Null value
Yongzhi Chen created HIVE-14715: --- Summary: Hive throws NumberFormatException with query with Null value Key: HIVE-14715 URL: https://issues.apache.org/jira/browse/HIVE-14715 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen The java.lang.NumberFormatException will throw with following reproduce: set hive.cbo.enable=false; CREATE TABLE `paqtest`( `c1` int, `s1` string, `s2` string, `bn1` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; insert into paqtest values (58, '', 'ABC', 0); SELECT 'Pricing mismatch' AS category, c1, NULL AS itemtype_used, NULL AS acq_itemtype, s2, NULL AS currency_used_avg, NULL AS acq_items_avg, sum(bn1) AS cca FROM paqtest WHERE (s1 IS NULL OR length(s1) = 0) GROUP BY 'Pricing mismatch', c1, NULL, NULL, s2, NULL, NULL; The stack like following: java.lang.NumberFormatException: ABC GroupByOperator.process(Object, int) line: 773 ExecReducer.reduce(Object, Iterator, OutputCollector, Reporter) line: 236 ReduceTask.runOldReducer(JobConf, TaskUmbilicalProtocol, TaskReporter, RawKeyValueIterator, RawComparator, Class, Class) line: 444 ReduceTask.run(JobConf, TaskUmbilicalProtocol) line: 392 LocalJobRunner$Job$ReduceTaskRunnable.run() line: 319 Executors$RunnableAdapter.call() line: 471 It works fine when hive.cbo.enable = true -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14596) Canceling hive query takes very long time
Yongzhi Chen created HIVE-14596: --- Summary: Canceling hive query takes very long time Key: HIVE-14596 URL: https://issues.apache.org/jira/browse/HIVE-14596 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen when the Hue user clicks cancel, the Hive query does not stop immediately, it can take very long time. And in the yarn job history you will see exceptions like following: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive/hive/80a5cfdb-9f98-44d2-ae53-332c8dae62a3/hive_2016-08-20_07-06-12_819_8780093905859269639-3/-mr-1/.hive-staging_hive_2016-08-20_07-06-12_819_8780093905859269639-3/_task_tmp.-ext-10001/_tmp.00_0 (inode 28224): File does not exist. Holder DFSClient_attempt_1471630445417_0034_m_00_0_-50732711_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3624) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3427) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3283) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.abortWriters(FileSinkOperator.java:246) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1007) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:206) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14538) beeline throws exceptions with parsing hive config when using !sh statement
Yongzhi Chen created HIVE-14538: --- Summary: beeline throws exceptions with parsing hive config when using !sh statement Key: HIVE-14538 URL: https://issues.apache.org/jira/browse/HIVE-14538 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen When beeline has a connection to a server, in some env it has following problem: {noformat} 0: jdbc:hive2://localhost> !verbose verbose: on 0: jdbc:hive2://localhost> !sh id java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hive.beeline.Commands.addConf(Commands.java:758) at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704) at org.apache.hive.beeline.Commands.sh(Commands.java:1002) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 0: jdbc:hive2://localhost> !sh echo hello java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hive.beeline.Commands.addConf(Commands.java:758) at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:704) at org.apache.hive.beeline.Commands.sh(Commands.java:1002) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 0: jdbc:hive2://localhost> {noformat} Also it breaks if there is no connection established: {noformat} beeline> !sh id java.lang.NullPointerException at org.apache.hive.beeline.BeeLine.createStatement(BeeLine.java:1897) at org.apache.hive.beeline.Commands.getConfInternal(Commands.java:724) at org.apache.hive.beeline.Commands.getHiveConf(Commands.java:702) at org.apache.hive.beeline.Commands.sh(Commands.java:1002) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1081) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:845) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14519) Multi insert query bug
Yongzhi Chen created HIVE-14519: --- Summary: Multi insert query bug Key: HIVE-14519 URL: https://issues.apache.org/jira/browse/HIVE-14519 Project: Hive Issue Type: Bug Components: Logical Optimizer Reporter: Yongzhi Chen Assignee: Yongzhi Chen When running multi-insert queries, when one of the query is not returning results, the other query is not returning the right result. For example: After following query, there is no value in /tmp/emp/dir3/00_0 {noformat} >From (select * from src) a insert overwrite directory '/tmp/emp/dir1/' select key, value insert overwrite directory '/tmp/emp/dir2/' select 'header' where 1=2 insert overwrite directory '/tmp/emp/dir3/' select key, value where key = 100; {noformat} where clause in the second insert should not affect the third insert. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14015) SMB MapJoin failed for Hive on Spark when kerberized
Yongzhi Chen created HIVE-14015: --- Summary: SMB MapJoin failed for Hive on Spark when kerberized Key: HIVE-14015 URL: https://issues.apache.org/jira/browse/HIVE-14015 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 2.0.0, 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication It could be reproduced: 1) prepare sample data: a=1 while [[ $a -lt 100 ]]; do echo $a ; let a=$a+1; done > data 2) prepare source hive table: CREATE TABLE `s`(`c` string); load data local inpath 'data' into table s; 3) prepare the bucketed table: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; CREATE TABLE `t`(`c` string) CLUSTERED BY (c) SORTED BY (c) INTO 5 BUCKETS; insert into t select * from s; 4) reproduce this issue: SET hive.execution.engine=spark; SET hive.auto.convert.sortmerge.join = true; SET hive.auto.convert.sortmerge.join.bigtable.selection.policy = org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ; SET hive.auto.convert.sortmerge.join.noconditionaltask = true; SET hive.optimize.bucketmapjoin = true; SET hive.optimize.bucketmapjoin.sortedmerge = true; select * from t join t t1 on t.c=t1.c; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13991) Union All on view fail with no valid permission on underneath table
Yongzhi Chen created HIVE-13991: --- Summary: Union All on view fail with no valid permission on underneath table Key: HIVE-13991 URL: https://issues.apache.org/jira/browse/HIVE-13991 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Yongzhi Chen Assignee: Yongzhi Chen When sentry is enabled. create view V as select * from T; When the user has read permission on view V, but does not have read permission on table T, select * from V union all select * from V failed with: {noformat} 0: jdbc:hive2://> select * from s07view union all select * from s07view limit 1; Error: Error while compiling statement: FAILED: SemanticException No valid privileges Required privileges for this query: Server=server1->Db=default->Table=sample_07->action=select; (state=42000,code=4) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13932) Hive SMB Map Join with small set of LIMIT failed with NPE
Yongzhi Chen created HIVE-13932: --- Summary: Hive SMB Map Join with small set of LIMIT failed with NPE Key: HIVE-13932 URL: https://issues.apache.org/jira/browse/HIVE-13932 Project: Hive Issue Type: Bug Affects Versions: 2.0.0, 1.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen 1) prepare sample data: a=1 while [[ $a -lt 100 ]]; do echo $a ; let a=$a+1; done > data 2) prepare source hive table: CREATE TABLE `s`(`c` string); load data local inpath 'data' into table s; 3) prepare the bucketed table: set hive.enforce.bucketing=true; set hive.enforce.sorting=true; CREATE TABLE `t`(`c` string) CLUSTERED BY (c) SORTED BY (c) INTO 5 BUCKETS; insert into t select * from s; 4) reproduce this issue: SET hive.auto.convert.sortmerge.join = true; SET hive.auto.convert.sortmerge.join.bigtable.selection.policy = org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ; SET hive.auto.convert.sortmerge.join.noconditionaltask = true; SET hive.optimize.bucketmapjoin = true; SET hive.optimize.bucketmapjoin.sortedmerge = true; select * from t join t t1 on t.c=t1.c limit 1; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 47787: HIVE-13453: Support ORDER BY and windowing clause in partitioning clause with distinct function
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47787/#review134784 --- ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java (line 169) <https://reviews.apache.org/r/47787/#comment199690> How do you handle countDistinct non windowing case? - Yongzhi Chen On May 24, 2016, 6:51 p.m., Aihua Xu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/47787/ > --- > > (Updated May 24, 2016, 6:51 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-13453: Support ORDER BY and windowing clause in partitioning clause with > distinct function > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java > 2f4a94c3796d3aff986eb638246248b75306183c > ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java > 3b54b4998c9efbf34bd9c5b08de55cd7062a0843 > ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java > 5ce72004e03bc19a38bd87ae70f38a0d35c20927 > ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowFunctionDef.java > ed6c67156b93d6f9e4b76fb76dfa28c5dee6fd0c > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java > 3c1ce26b26646a6075b3a661816e8d1b50ffc78e > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java > 2825045890de1bcc414197ad3e06e723b9d212f3 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFParameterInfo.java > 6a62d7cc324286ae9aee95d2d71a688859f8c03f > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java > 7b1d6e545cdf35f3b2906621c7b0208bf0433731 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/SimpleGenericUDAFParameterInfo.java > 1a1b570256afff46761daf4ebcf1da5e8f0e4f88 > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java > 858b47ad43fa751e23482e4cb58f77bb9fb16a27 > ql/src/test/queries/clientpositive/windowing_distinct.q > bb192a7882fda592b3d2ba09a10c2f899aa5e165 > ql/src/test/results/clientpositive/windowing_distinct.q.out > 074a59498ebebc9e78553f68f59dd00bb51f4792 > > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java > c58e8ed05453c78cbe2e4daf0b7afa51adbc0ce9 > > Diff: https://reviews.apache.org/r/47787/diff/ > > > Testing > --- > > > Thanks, > > Aihua Xu > >
Re: Review Request 47040: Monitor changes to FairScheduler.xml file and automatically update / validate jobs submitted to fair-scheduler
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47040/#review133592 --- ql/src/java/org/apache/hadoop/hive/ql/Driver.java (line 533) <https://reviews.apache.org/r/47040/#comment198119> This if statement is duplicate with the Precondition. If you want to throw exception,only use Precondition, otherwise, just use if statement. Use both will end up checking the same condition twice. - Yongzhi Chen On May 14, 2016, 5:51 p.m., Reuben Kuhnert wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/47040/ > --- > > (Updated May 14, 2016, 5:51 p.m.) > > > Review request for hive, Lenni Kuff, Mohit Sabharwal, and Sergio Pena. > > > Bugs: HIVE-13696 > https://issues.apache.org/jira/browse/HIVE-13696 > > > Repository: hive-git > > > Description > --- > > Ensure that jobs sent to YARN with impersonation off are correctly routed to > the proper queue based on fair-scheduler.xml. Monitor this file for changes > and validate that jobs can only be sent to queues authorized for the user. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java > 3fecc5c4ca2a06a031c0c4a711fb49e757c49062 > ql/src/java/org/apache/hadoop/hive/ql/session/YarnFairScheduling.java > PRE-CREATION > service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java > a0015ebc655931f241b28c53fbb94cfe172841b1 > shims/common/src/main/java/org/apache/hadoop/hive/shims/SchedulerShim.java > 63803b8b0752745bd2fedaccc5d100befd97093b > shims/scheduler/pom.xml b36c12325c588cdb609c6200b1edef73a2f79552 > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerQueueAllocator.java > PRE-CREATION > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/FairSchedulerShim.java > 372244dc3c989d2a3ae2eb2bfb8cd0a235705e18 > > shims/scheduler/src/main/java/org/apache/hadoop/hive/schshim/QueueAllocator.java > PRE-CREATION > > shims/scheduler/src/test/java/org/apache/hadoop/hive/schshim/TestFairSchedulerQueueAllocator.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/47040/diff/ > > > Testing > --- > > > Thanks, > > Reuben Kuhnert > >
[jira] [Created] (HIVE-13632) Hive failing on insert empty array into parquet table
Yongzhi Chen created HIVE-13632: --- Summary: Hive failing on insert empty array into parquet table Key: HIVE-13632 URL: https://issues.apache.org/jira/browse/HIVE-13632 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen The insert will fail with following stack: {noformat} by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:271) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$ListDataWriter.write(DataWritableWriter.java:271) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:199) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:215) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:88) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:116) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:697) {noformat} Reproduce: {noformat} create table test_small ( key string, arrayValues array) stored as parquet; insert into table test_small select 'abcd', array() from src limit 1; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13570) Some query with Union all fails when CBO is off
Yongzhi Chen created HIVE-13570: --- Summary: Some query with Union all fails when CBO is off Key: HIVE-13570 URL: https://issues.apache.org/jira/browse/HIVE-13570 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Some queries with union all throws IndexOutOfBoundsException when: set hive.cbo.enable=false; set hive.ppd.remove.duplicatefilters=true; The stack is as: {noformat} {code} java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161) at org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273) at org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13200) Aggregation functions returning empty rows on partitioned columns
Yongzhi Chen created HIVE-13200: --- Summary: Aggregation functions returning empty rows on partitioned columns Key: HIVE-13200 URL: https://issues.apache.org/jira/browse/HIVE-13200 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 2.0.0, 1.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Running aggregation functions like MAX, MIN, DISTINCT against partitioned columns will return empty rows if table has property: 'skip.header.line.count'='1' Reproduce: {noformat} DROP TABLE IF EXISTS test; CREATE TABLE test (a int) PARTITIONED BY (b int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' TBLPROPERTIES('skip.header.line.count'='1'); INSERT OVERWRITE TABLE test PARTITION (b = 1) VALUES (1), (2), (3), (4); INSERT OVERWRITE TABLE test PARTITION (b = 2) VALUES (1), (2), (3), (4); SELECT * FROM test; SELECT DISTINCT b FROM test; SELECT MAX(b) FROM test; SELECT DISTINCT a FROM test; {noformat} The output: {noformat} 0: jdbc:hive2://localhost:1/default> SELECT * FROM test; +-+-+--+ | test.a | test.b | +-+-+--+ | 2 | 1 | | 3 | 1 | | 4 | 1 | | 2 | 2 | | 3 | 2 | | 4 | 2 | +-+-+--+ 6 rows selected (0.631 seconds) 0: jdbc:hive2://localhost:1/default> SELECT DISTINCT b FROM test; ++--+ | b | ++--+ ++--+ No rows selected (47.229 seconds) 0: jdbc:hive2://localhost:1/default> SELECT MAX(b) FROM test; +---+--+ | _c0 | +---+--+ | NULL | +---+--+ 1 row selected (49.508 seconds) 0: jdbc:hive2://localhost:1/default> SELECT DISTINCT a FROM test; ++--+ | a | ++--+ | 2 | | 3 | | 4 | ++--+ 3 rows selected (46.859 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13065) Hive throws NPE when writing map type data to a HBase backed table
Yongzhi Chen created HIVE-13065: --- Summary: Hive throws NPE when writing map type data to a HBase backed table Key: HIVE-13065 URL: https://issues.apache.org/jira/browse/HIVE-13065 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Hive throws NPE when writing data to a HBase backed table with below conditions: # There is a map type column # The map type column has NULL in its values Below are the reproduce steps: *1) Create a HBase backed Hive table* {code:sql} create table hbase_test (id bigint, data map<string, string>) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping" = ":key,cf:map_col") tblproperties ("hbase.table.name" = "hive_test"); {code} *2) insert data into above table* {code:sql} insert overwrite table hbase_test select 1 as id, map('abcd', null) as data from src limit 1; {code} The mapreduce job for insert query fails. Error messages are as below: {noformat} 2016-02-15 02:26:33,225 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":1,"_col1":{"abcd":null}}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:731) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:51) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) ... 7 more Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:286) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:666) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:221) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:236) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:275) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:222) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282) ... 15 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13039) BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table
Yongzhi Chen created HIVE-13039: --- Summary: BETWEEN predicate is not functioning correctly with predicate pushdown on Parquet table Key: HIVE-13039 URL: https://issues.apache.org/jira/browse/HIVE-13039 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 1.2.1, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen BETWEEN becomes exclusive in parquet table when predicate pushdown is on (as it is by default in newer Hive versions). To reproduce(in a cluster, not local setup): CREATE TABLE parquet_tbl( key int, ldate string) PARTITIONED BY ( lyear string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; insert overwrite table parquet_tbl partition (lyear='2016') select 1, '2016-02-03' from src limit 1; set hive.optimize.ppd.storage = true; set hive.optimize.ppd = true; select * from parquet_tbl where ldate between '2016-02-03' and '2016-02-03'; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12795) Vectorized execution causes ClassCastException
Yongzhi Chen created HIVE-12795: --- Summary: Vectorized execution causes ClassCastException Key: HIVE-12795 URL: https://issues.apache.org/jira/browse/HIVE-12795 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen In some hive versions, when set hive.auto.convert.join=false; set hive.vectorized.execution.enabled = true; Some join queries fail with ClassCastException: The stack: {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableStringObjectInspector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:419) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1102) at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126) ... 22 more {noformat} It can not be reproduced in hive 2.0 and 1.3 because of different code path. Reproduce: {noformat} CREATE TABLE test1 ( id string) PARTITIONED BY ( cr_year bigint, cr_month bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' TBLPROPERTIES ( 'serialization.null.format'='' ); CREATE TABLE test2( id string ) PARTITIONED BY ( cr_year bigint, cr_month bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' TBLPROPERTIES ( 'serialization.null.format'='' ); set hive.auto.convert.join=false; set hive.vectorized.execution.enabled = true; SELECT cr.id1 , cr.id2 FROM (SELECT t1.id id1, t2.id id2 from (select * from test1 ) t1 left outer join test2 t2 on t1.id=t2.id) cr; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12784) Group by SemanticException: Invalid column reference
Yongzhi Chen created HIVE-12784: --- Summary: Group by SemanticException: Invalid column reference Key: HIVE-12784 URL: https://issues.apache.org/jira/browse/HIVE-12784 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Some queries work fine in older versions throws SemanticException, the stack trace: {noformat} FAILED: SemanticException [Error 10002]: Line 96:1 Invalid column reference 'key2' 15/12/21 18:56:44 [main]: ERROR ql.Driver: FAILED: SemanticException [Error 10002]: Line 96:1 Invalid column reference 'key2' org.apache.hadoop.hive.ql.parse.SemanticException: Line 96:1 Invalid column reference 'key2' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:4228) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5670) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9007) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9884) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9777) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10250) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10261) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10141) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1158) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} Reproduce: {noformat} create table tlb (key int, key1 int, key2 int); create table src (key int, value string); select key, key1, key2 from (select a.key, 0 as key1 , 0 as key2 from tlb a inner join src b on a.key = b.key) a group by key, key1, key2; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly
Yongzhi Chen created HIVE-12646: --- Summary: beeline and HIVE CLI do not parse ; in quote properly Key: HIVE-12646 URL: https://issues.apache.org/jira/browse/HIVE-12646 Project: Hive Issue Type: Bug Components: CLI, Clients Reporter: Yongzhi Chen Assignee: Vaibhav Gumashta Beeline and Cli have to escape ; in the quote while most other shell scripts need not. For example: in Beeline: {noformat} 0: jdbc:hive2://localhost:1> select ';' from tlb1; select ';' from tlb1; 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403 Error: Error while compiling statement: FAILED: ParseException line 1:8 cannot recognize input near '' ' {noformat} while in mysql shell: {noformat} mysql> SELECT CONCAT(';', 'foo') FROM test limit 3; ++ | ;foo | | ;foo | | ;foo | ++ 3 rows in set (0.00 sec) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12378) Exception on HBaseSerDe.serialize binary field
Yongzhi Chen created HIVE-12378: --- Summary: Exception on HBaseSerDe.serialize binary field Key: HIVE-12378 URL: https://issues.apache.org/jira/browse/HIVE-12378 Project: Hive Issue Type: Bug Components: HBase Handler, Serializers/Deserializers Affects Versions: 1.1.0, 1.0.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen An issue was reproduced with the binary typed HBase columns in Hive: It works fine as below: CREATE TABLE test9 (key int, val string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:val#b" ); insert into test9 values(1,"hello"); But when string type is changed to binary as: CREATE TABLE test2 (key int, val binary) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:val#b" ); insert into table test2 values(1, 'hello'); The following exception is thrown: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"hello"} ... Caused by: java.lang.RuntimeException: Hive internal error. at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:322) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194) at org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118) at org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282) ... 16 more We should support hive binary type column for hbase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
Yongzhi Chen created HIVE-12189: --- Summary: The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large Key: HIVE-12189 URL: https://issues.apache.org/jira/browse/HIVE-12189 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.1.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Some queries are very slow in compile time, for example following query {noformat} select * from tt1 nf join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and a2.hdp_databaseid = nf.hdp_databaseid) join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and a5.hdp_databaseid = nf.hdp_databaseid) JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid = nf.hdp_databaseid) JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid = nf.hdp_databaseid) where nf.hdp_databaseid = 102 limit 10; {noformat} takes around 120 seconds to compile in hive 1.1 when hive.mapred.mode=strict; hive.optimize.ppd=true; and hive is not in test mode. All the above tables are tables with one column as partition. But all the tables are empty table. If the tables are not empty, it is reported that the compile so slow that it looks like hive is hanging. In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is still a lot of time. One of the problem slows ppd down is that list in pushdownPreds can grow very large which makes extractPushdownPreds bad performance: {noformat} public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, Operator op, List preds) {noformat} During run the query above, in the following break point preds has size of 12051, and most entry of the list is: GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), Following code in extractPushdownPreds will clone all the nodes in preds and do the walk. Hive 2.0 is faster because HIVE-11652 makes startWalking much faster, but we still clone thousands of nodes with same expression. Should we store so many same predicates in the list or just one is good enough? {noformat} List startNodes = new ArrayList(); List clonedPreds = new ArrayList(); for (ExprNodeDesc node : preds) { ExprNodeDesc clone = node.clone(); clonedPreds.add(clone); exprContext.getNewToOldExprMap().put(clone, node); } startNodes.addAll(clonedPreds); egw.startWalking(startNodes, null); {noformat} Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java method public void addFinalCandidate(String alias, ExprNodeDesc expr) and public void addPushDowns(String alias, List pushDowns) to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12058) Change hive script to record errors when calling hbase fails
Yongzhi Chen created HIVE-12058: --- Summary: Change hive script to record errors when calling hbase fails Key: HIVE-12058 URL: https://issues.apache.org/jira/browse/HIVE-12058 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Affects Versions: 1.1.0, 0.14.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen By default hive will try to find out which jars need to be added to the classpath in order to run MR jobs against an HBase cluster, however if hbase can't be found or if hbase mapredcp fails, the hive script will fail silently and ignore some of the jars to be included into the. That makes very difficult to analyze the real problem. Hive script should record the error not just simply redirect two hbase failures: HBASE_BIN=$ {HBASE_BIN:-"$(which hbase 2>/dev/null)"} $HBASE_BIN mapredcp 2>/dev/null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12008) Make last two tests added by HIVE-11384 pass when hive.in.test is false
Yongzhi Chen created HIVE-12008: --- Summary: Make last two tests added by HIVE-11384 pass when hive.in.test is false Key: HIVE-12008 URL: https://issues.apache.org/jira/browse/HIVE-12008 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen The last two qfile unit tests fail when hive.in.test is false. It may relate how we handle prunelist for select. When select include every column in a table, the prunelist for the select is empty. It may cause issues to calculate its parent's prunelist.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 38946: Need review the fix for HIVE-11973
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38946/ --- Review request for hive, Chao Sun, Chaoyu Tang, and Szehon Ho. Repository: hive-git Description --- HIVE-11973: IN operator fails when the column type is DATE Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 218b2df3e6bf4d8094d01cf0c78934324a04f1b1 ql/src/test/queries/clientpositive/selectindate.q PRE-CREATION ql/src/test/results/clientpositive/selectindate.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38946/diff/ Testing --- Add new qfile test for the issue and run pre-commit build Thanks, Yongzhi Chen
[jira] [Created] (HIVE-11982) Some test case for union all with recent changes
Yongzhi Chen created HIVE-11982: --- Summary: Some test case for union all with recent changes Key: HIVE-11982 URL: https://issues.apache.org/jira/browse/HIVE-11982 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen The tests throw java.lang.IndexOutOfBoundsException again. It was supposed to be fixed by HIVE-11271 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working
> On Sept. 12, 2015, 1:23 a.m., Szehon Ho wrote: > > I dont know if you saw in the earlier comments, please add a test to the > > file 'FolderPermissionBase' to verify permission inheritance works with the > > feature. Sorry, overlooked the comments. I added a test to cover this. The fix respect original design: destination partition folder inheritance original partition folder's permission. For the intermediate folders between destination partition folder and base table folder, if they do not exist, the permission inherit from base table folder's(the same behavior as when add a new partition), otherwise keep their original permission. - Yongzhi --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38216/#review98723 --- On Sept. 12, 2015, 4:07 a.m., Yongzhi Chen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/38216/ > --- > > (Updated Sept. 12, 2015, 4:07 a.m.) > > > Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang. > > > Bugs: HIVE-11745 > https://issues.apache.org/jira/browse/HIVE-11745 > > > Repository: hive-git > > > Description > --- > > Alter table Exchange partition with multiple partition_spec does not work in > cluster mode because in rename, the parent folder for destination path does > not physically exist. Some files system(hdfs for instance) does not > support(or allow) this. Fix by create parent folder first. > > > Diffs > - > > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/FolderPermissionBase.java > f28edc66ea4644c5847ee6abe2e26306f9fbb43e > itests/src/test/resources/testconfiguration.properties > bed621d3eb74f01e54110552f68538afd228018d > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 1840e76cc567e95e1942d912b8ab0db516d63a3b > ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION > ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/38216/diff/ > > > Testing > --- > > Add minimr unit test. > > > Thanks, > > Yongzhi Chen > >
[jira] [Created] (HIVE-11801) In HMS HA env, "show databases" fails when"current" HMS is stopped.
Yongzhi Chen created HIVE-11801: --- Summary: In HMS HA env, "show databases" fails when"current" HMS is stopped. Key: HIVE-11801 URL: https://issues.apache.org/jira/browse/HIVE-11801 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.1.0, 1.2.0, 0.14.0, 2.0.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Reproduce steps: # Enable HMS HA on a cluster # Use beeline to connect to HS2 and execute command {{show databases}}. Don't quit beeline after command has finished # Stop the first HMS in configuration {{hive.metastore.uri}} # Execute {{show databases}} in beeline again. Will get below error: {noformat} MetaException(message:Got exception: org.apache.thrift.transport.TTransportException java.net.SocketException: Broken pipe) {noformat} The error message in HS2 is as below: {noformat} 2015-09-08 12:06:53,236 ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException java.net.SocketException: Broken pipe org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161) at org.apache.thrift.transport.TSaslTransport.flush(TSaslTransport.java:501) at org.apache.thrift.transport.TSaslClientTransport.flush(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.TFilterTransport.flush(TFilterTransport.java:77) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_databases(ThriftHiveMetastore.java:692) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:684) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:964) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91) at com.sun.proxy.$Proxy6.getDatabases(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1909) at com.sun.proxy.$Proxy6.getDatabases(Unknown Source) at org.apache.hive.service.cli.operation.GetSchemasOperation.runInternal(GetSchemasOperation.java:59) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257) at org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:462) at org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:296) at org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:534) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159) ... 31 more 2015-09-08 12:06:53,238 ERROR hive.log: Converting exception to MetaException 2015-09-08 12:06:53,238 WARN org.apache.hive.service.cli.thrift.ThriftCLIService: Error getting schemas: org.apache.hive.service.cli.HiveSQLException: MetaException(message:Got exception: org.apache.thrift.transport
Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working
> On Sept. 10, 2015, 6:02 p.m., Szehon Ho wrote: > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, > > line 2552 > > <https://reviews.apache.org/r/38216/diff/1/?file=1065987#file1065987line2552> > > > > I think this whole method can be moved to FileUtils for organization. > > Also please check if there's any method there already. > > Yongzhi Chen wrote: > I think it may be better as a private method in the HiveMetaStore class > for it will using its private variable wh (hdfs warehouse) . > > Szehon Ho wrote: > Actually looking more into the code, this method should not be necessary. > You can just call wh.mkdirs directly. The underlying FileSystem.mkdirs has > the same semantics as -p, there should be no file system that violates this. > If there were, many other partition codes would break.. Thanks Szehon. As you pointed out and the name of the function, the wh.mkdirs should be the same as mkdir -p in all the filesystem. I have worried too much. The third patch remove the createFullPath method and use mkdirs directly. I also add a new test case to cover the case when more than one intermediate dirs are missing. - Yongzhi --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38216/#review98435 ------- On Sept. 11, 2015, 1:02 p.m., Yongzhi Chen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/38216/ > --- > > (Updated Sept. 11, 2015, 1:02 p.m.) > > > Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang. > > > Bugs: HIVE-11745 > https://issues.apache.org/jira/browse/HIVE-11745 > > > Repository: hive-git > > > Description > --- > > Alter table Exchange partition with multiple partition_spec does not work in > cluster mode because in rename, the parent folder for destination path does > not physically exist. Some files system(hdfs for instance) does not > support(or allow) this. Fix by create parent folder first. > > > Diffs > - > > itests/src/test/resources/testconfiguration.properties > bed621d3eb74f01e54110552f68538afd228018d > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 1840e76cc567e95e1942d912b8ab0db516d63a3b > ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION > ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/38216/diff/ > > > Testing > --- > > Add minimr unit test. > > > Thanks, > > Yongzhi Chen > >
Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38216/ --- (Updated Sept. 12, 2015, 4:07 a.m.) Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang. Changes --- add test to test permission inheritance. Bugs: HIVE-11745 https://issues.apache.org/jira/browse/HIVE-11745 Repository: hive-git Description --- Alter table Exchange partition with multiple partition_spec does not work in cluster mode because in rename, the parent folder for destination path does not physically exist. Some files system(hdfs for instance) does not support(or allow) this. Fix by create parent folder first. Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/FolderPermissionBase.java f28edc66ea4644c5847ee6abe2e26306f9fbb43e itests/src/test/resources/testconfiguration.properties bed621d3eb74f01e54110552f68538afd228018d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1840e76cc567e95e1942d912b8ab0db516d63a3b ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38216/diff/ Testing --- Add minimr unit test. Thanks, Yongzhi Chen
Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38216/ --- (Updated Sept. 11, 2015, 1:02 p.m.) Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-11745 https://issues.apache.org/jira/browse/HIVE-11745 Repository: hive-git Description --- Alter table Exchange partition with multiple partition_spec does not work in cluster mode because in rename, the parent folder for destination path does not physically exist. Some files system(hdfs for instance) does not support(or allow) this. Fix by create parent folder first. Diffs (updated) - itests/src/test/resources/testconfiguration.properties bed621d3eb74f01e54110552f68538afd228018d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1840e76cc567e95e1942d912b8ab0db516d63a3b ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38216/diff/ Testing --- Add minimr unit test. Thanks, Yongzhi Chen
Re: Review Request 38216: HIVE-11745: Alter table Exchange partition with multiple partition_spec is not working
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38216/ --- (Updated Sept. 10, 2015, 7:36 p.m.) Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-11745 https://issues.apache.org/jira/browse/HIVE-11745 Repository: hive-git Description --- Alter table Exchange partition with multiple partition_spec does not work in cluster mode because in rename, the parent folder for destination path does not physically exist. Some files system(hdfs for instance) does not support(or allow) this. Fix by create parent folder first. Diffs (updated) - itests/src/test/resources/testconfiguration.properties bed621d3eb74f01e54110552f68538afd228018d metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1840e76cc567e95e1942d912b8ab0db516d63a3b ql/src/test/queries/clientpositive/exchgpartition2lel.q PRE-CREATION ql/src/test/results/clientpositive/exchgpartition2lel.q.out PRE-CREATION Diff: https://reviews.apache.org/r/38216/diff/ Testing --- Add minimr unit test. Thanks, Yongzhi Chen
[jira] [Created] (HIVE-11745) Alter table Exchange partition with multiple partition_spec is not working
Yongzhi Chen created HIVE-11745: --- Summary: Alter table Exchange partition with multiple partition_spec is not working Key: HIVE-11745 URL: https://issues.apache.org/jira/browse/HIVE-11745 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Single partition works, but multiple partitions will not work. Reproduce steps: {noformat} DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; DROP TABLE IF EXISTS t3; DROP TABLE IF EXISTS t4; CREATE TABLE t1 (a int) PARTITIONED BY (d1 int); CREATE TABLE t2 (a int) PARTITIONED BY (d1 int); CREATE TABLE t3 (a int) PARTITIONED BY (d1 int, d2 int); CREATE TABLE t4 (a int) PARTITIONED BY (d1 int, d2 int); INSERT OVERWRITE TABLE t1 PARTITION (d1 = 1) SELECT salary FROM jsmall LIMIT 10; INSERT OVERWRITE TABLE t3 PARTITION (d1 = 1, d2 = 1) SELECT salary FROM jsmall LIMIT 10; SELECT * FROM t1; SELECT * FROM t3; ALTER TABLE t2 EXCHANGE PARTITION (d1 = 1) WITH TABLE t1; SELECT * FROM t1; SELECT * FROM t2; ALTER TABLE t4 EXCHANGE PARTITION (d1 = 1, d2 = 1) WITH TABLE t3; SELECT * FROM t3; SELECT * FROM t4; {noformat} The output: {noformat} 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t3; +---+++--+ | t3.a | t3.d1 | t3.d2 | +---+++--+ +---+++--+ No rows selected (0.227 seconds) 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t4; +---+++--+ | t4.a | t4.d1 | t4.d2 | +---+++--+ +---+++--+ No rows selected (0.266 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11604) HIVE return wrong results in some queries with PTF function
Yongzhi Chen created HIVE-11604: --- Summary: HIVE return wrong results in some queries with PTF function Key: HIVE-11604 URL: https://issues.apache.org/jira/browse/HIVE-11604 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.1.0, 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Following query returns empty result which is not right: {noformat} select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; {noformat} After remove row_number() over (partition by id, fkey) as rnum from query, the right result returns. Reproduce: {noformat} create table tlb1 (id int, fkey int, val string); create table tlb2 (fid int, name string); insert into table tlb1 values(100,1,'abc'); insert into table tlb1 values(200,1,'efg'); insert into table tlb2 values(1, 'key1'); select ddd.id, ddd.fkey, aaa.name from ( select id, fkey, row_number() over (partition by id, fkey) as rnum from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Ended Job = job_local1070163923_0017 +-+---+---+--+ No rows selected (14.248 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ +-+---+---+--+ 0: jdbc:hive2://localhost:1 select ddd.id, ddd.fkey, aaa.name from ( select id, fkey from tlb1 group by id, fkey ) ddd inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name 0: jdbc:hive2://localhost:1 from ( 0: jdbc:hive2://localhost:1 select id, fkey 0: jdbc:hive2://localhost:1 from tlb1 group by id, fkey 0: jdbc:hive2://localhost:1 ) ddd 0: jdbc:hive2://localhost:1 inner join tlb2 aaa on aaa.fid = ddd.fkey; INFO : Number of reduce tasks not specified. Estimated from input data size: 1 ... INFO : Ended Job = job_local672340505_0019 +-+---+---+--+ 2 rows selected (14.383 seconds) | ddd.id | ddd.fkey | aaa.name | +-+---+---+--+ | 100 | 1 | key1 | | 200 | 1 | key1 | +-+---+---+--+ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11502) Map side aggregation is extremely slow
Yongzhi Chen created HIVE-11502: --- Summary: Map side aggregation is extremely slow Key: HIVE-11502 URL: https://issues.apache.org/jira/browse/HIVE-11502 Project: Hive Issue Type: Bug Components: Logical Optimizer, Physical Optimizer Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen For the query as following: {noformat} create table tbl2 as select col1, max(col2) as col2 from tbl1 group by col1; {noformat} If the column for group by has many different values (for example 40), the map side aggregation is very slow. I ran the query which took more than 3 hours , after 3 hours, I have to kill the query. The same query can finish in 7 seconds, if I turn off map side aggregation by: {noformat} set hive.map.aggr = false; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11380) NPE when FileSinkOperator is not inialized
Yongzhi Chen created HIVE-11380: --- Summary: NPE when FileSinkOperator is not inialized Key: HIVE-11380 URL: https://issues.apache.org/jira/browse/HIVE-11380 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen When FileSinkOperator's initializeOp is not called (which may happen when an operator before FileSinkOperator initializeOp failed), FileSinkOperator will throw NPE at close time. The stacktrace: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519) ... 18 more {noformat} This Exception is misleading and often distracts users from finding real issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333
Yongzhi Chen created HIVE-11384: --- Summary: Add Test case which cover both HIVE-11271 and HIVE-11333 Key: HIVE-11384 URL: https://issues.apache.org/jira/browse/HIVE-11384 Project: Hive Issue Type: Test Components: Logical Optimizer, Parser Affects Versions: 1.2.0, 1.0.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11319) CTAS with location qualifier overwrites directories
Yongzhi Chen created HIVE-11319: --- Summary: CTAS with location qualifier overwrites directories Key: HIVE-11319 URL: https://issues.apache.org/jira/browse/HIVE-11319 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 1.2.0, 1.0.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen CTAS with location clause acts as an insert overwrite. This can cause problems when there sub directories with in a directory. This cause some users accidentally wipe out directories with very important data. We should bind CTAS with location to a non-empty directory. Reproduce: create table ctas1 location '/Users/ychen/tmp' as select * from jsmall limit 10; create table ctas2 location '/Users/ychen/tmp' as select * from jsmall limit 5; Both creates will succeed. But value in table ctas1 will be replaced by ctas2 accidentally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11271) java.lang.IndexOutOfBoundsException when union all with if function
Yongzhi Chen created HIVE-11271: --- Summary: java.lang.IndexOutOfBoundsException when union all with if function Key: HIVE-11271 URL: https://issues.apache.org/jira/browse/HIVE-11271 Project: Hive Issue Type: Bug Affects Versions: 1.2.0, 1.0.0, 0.14.0 Reporter: Yongzhi Chen Some queries with Union all as subquery fail in MapReduce task with stacktrace: {noformat} 15/07/15 14:19:30 [pool-13-thread-1]: INFO exec.UnionOperator: Initializing operator UNION[104] 15/07/15 14:19:30 [Thread-72]: INFO mapred.LocalJobRunner: Map task executor complete. 15/07/15 14:19:30 [Thread-72]: WARN mapred.LocalJobRunner: job_local826862759_0005 java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 10 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140) ... 21 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119) ... 21 more {noformat} Reproduce: {noformat} create table if not exists union_all_bug_test_1 ( f1 int, f2 int ); create table if not exists union_all_bug_test_2 ( f1 int ); SELECT f1 FROM ( SELECT f1 , if('helloworld' like '%hello%' ,f1,f2) as filter FROM union_all_bug_test_1 union all select f1 , 0 as filter from union_all_bug_test_2 ) A WHERE (filter = 1); {noformat} -- This message